how to connect datalake gen2 storage to azure ml ?

MarwanSamrout-7915 40 Reputation points
2024-01-26T16:38:06.9833333+00:00

im trying to create a datastore in azure ml from a datalake gen2 storage , and i have registered an app and i gave it Storage Blob Data Reader permission but when i create the datastore i see it empty even the datastore has 3 data files. i did follow this https://www2.microstrategy.com/producthelp/Current/Gateway_Connections/WebHelp/Lang_1033/Content/adls_service_account_connectivity.htm
to register the app.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,547 questions
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,201 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Amira Bedhiafi 30,831 Reputation points
    2024-01-28T11:53:50.7666667+00:00

    Sign in to Azure portal, select the correct directory, and create an app registration. During this process, you will get an Application (client) ID and Directory (tenant) ID which are important for later steps. Under your app registration, create a new client secret and save its value securely. This secret acts as the password for your application. Then navigate to your storage account, go to Access Control (IAM), and add a role assignment. Assign the "Storage Blob Data Contributor" role to the service principal you created. Store the client secret in Azure Key Vault for secure management. Then create Azure Key Vault-backed secret scope in Azure Databricks, where you'll provide details like the DNS Name and Resource ID of the Azure Key Vault. In your Azure Databricks workspace, use Python code to set up the connection to Azure Data Lake Storage Gen2. This involves using the secret scope and details like the application ID and tenant ID that you obtained earlier. If you have a firewall on Azure Data Lake Storage Gen2, ensure your Azure Databricks workspace can connect to it. This might involve setting up private endpoints or configuring virtual network service endpoints. Here is a step by step tuto : https://medium.com/geekculture/how-to-connect-azure-data-lake-gen-2-to-azure-machine-learning-510c00115add Connect to Azure Data Lake Storage Gen2 - Azure Databricks : https://learn.microsoft.com/en-us/azure/databricks/scenarios/databricks-datalake-gen2-get-started Azure Machine Learning : https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-upload-explore-data Azure Databricks & Spark - Azure Storage : https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-use-databricks-spark

    0 comments No comments

  2. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.