Difference between connect and mount in Azure Databricks

Wentzler, Charlotte 41 Reputation points
2022-12-09T14:06:14.91+00:00

I mounted my Azure Storage Account on Azure Databricks using dbutils and Python like in this page, with the method using Azure Service Principal: https://learn.microsoft.com/en-us/azure/databricks/dbfs/mounts

configs = {"fs.azure.account.auth.type": "OAuth",  
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",  
          "fs.azure.account.oauth2.client.id": "<application-id>",  
          "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),  
          "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}  
  
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",  
  mount_point = "/mnt/<mount-name>",  
  extra_configs = configs)  

but I also saw there is an option to do a connection with spark to the Azure Blob File System (ABFS) driver like in this page: https://learn.microsoft.com/en-us/azure/databricks/external-data/azure-storage

service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")  
  
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")  
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")  
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")  
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)  
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")  

I couldn't find information about the difference? In which use cases is it better to use one or the other? Is one method faster than the other to get information from the stored data in the Azure Storage Account? Does the connection only allows access from one cluster?
So my questions would be around performance, security and data transfer.

Thanks a lot in advance!

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,351 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,942 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 77,901 Reputation points Microsoft Employee
    2022-12-12T08:00:38.673+00:00

    Hello @Wentzler, Charlotte ,

    Thanks for the question and using MS Q&A platform.

    Accessing the ADLS Gen2 via mount point and using (spark.conf.set) OAuth 2.0 with an Azure Active Directory (Azure AD) application service principal for authentication are the different ways of access the ADLS gen2 account using Azure Databricks.

    Mount: Azure Databricks mounts create a link between a workspace and cloud object storage, which enables you to interact with cloud object storage using familiar file paths relative to the Databricks file system. Mounts work by creating a local alias under the /mnt directory.

    • Storage account linked to the workspace
    • Storage account accessibale to eveyone that has access to your databricks workspace Spark.conf.set: You can securely access data in an Azure storage account using OAuth 2.0 with an Azure Active Directory (Azure AD) application service principal for authentication
    • Limited to the cluster where the notebook is attached
    • Applicable for the specific session when you are running the notebook and once the cluster stopped this will not work anymore.

    These are the deprecated patterns for storing and accessing data from Azure Databricks:

    269594-image.png

    This article Enterprise security for Azure Databricks provides an overview of the most important security-related controls and configurations for the deployment of Azure Databricks.

    Hope this will help. Please let us know if any further queries.


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful