How to connect Databricks to Storage account in Azure when I have no access to app registriation

Zhilin Song 0 Reputation points
2024-10-19T06:17:20.4066667+00:00

How to connect Databricks to Storage account in Azure when I have no access to app registriation. Does this mean I can only go to my administrator to fix this?

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,183 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,181 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 25,261 Reputation points
    2024-10-19T11:43:30.4166667+00:00

    If you don’t have access to app registration, there are still a few ways to connect Azure Databricks to an Azure Storage account. You won’t be able to use service principals directly (which requires app registration), but you can leverage other options that don’t require admin-level privileges. Here are a few alternative methods:

    If AAD passthrough is enabled in your Databricks environment, you can authenticate directly using your own credentials without needing an app registration. This allows Azure Databricks to authenticate with the storage account using your Azure AD identity.

    You need to check with your administrator to ensure that Azure AD passthrough authentication is configured in your Databricks workspace.

    So you need to use the following example code in a Databricks notebook to mount the storage account to DBFS:

    
    # Configuration for the storage account
    
    storage_account_name = "your_storage_account_name"
    
    container_name = "your_container_name"
    
    # Mount the storage account to DBFS
    
    dbutils.fs.mount(
    
        source=f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net/",
    
        mount_point="/mnt/your-mount-point",
    
        extra_configs={"fs.azure.account.auth.type": "OAuth",
    
                       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    
                       "fs.azure.account.oauth2.client.id": "your-client-id",
    
                       "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="your-scope", key="your-secret"),
    
                       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/your-tenant-id/oauth2/token"}
    
    )
    
    

    After mounting, you can read and write files to Azure Blob Storage as if they were part of the Databricks File System (DBFS).

    Or, you can also generate a Shared Access Signature (SAS) token for the storage account if you have permission. This doesn’t require app registration and is a simpler way to authenticate without needing Azure AD or a service principal.

    Either request this from your administrator or generate it yourself if you have the necessary permissions in the Azure portal.

    
    storage_account_name = "your_storage_account_name"
    
    container_name = "your_container_name"
    
    sas_token = "your_sas_token"
    
    dbutils.fs.mount(
    
        source=f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net/",
    
        mount_point="/mnt/your-mount-point",
    
        extra_configs={f"fs.azure.sas.{container_name}.{storage_account_name}.blob.core.windows.net": sas_token}
    
    )
    

    If you don’t want to mount the storage account, you can also directly read and write data using Azure SDKs (like Azure Blob Storage SDK) or Databricks native connectors.

    
    from pyspark.sql import SparkSession
    
    # Example using the storage account and SAS token
    
    storage_account_name = "your_storage_account_name"
    
    container_name = "your_container_name"
    
    sas_token = "your_sas_token"
    
    # Construct the URL with SAS token
    
    url = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net/your_file.csv?{sas_token}"
    
    # Read the file into a DataFrame
    
    df = spark.read.csv(url)
    
    # Show the data
    
    df.show()
    

    If you have access to storage account keys (I don't recommended for production but okay for testing), you can use them to connect Databricks to the storage account.

    Request this from your administrator or retrieve it from the Azure portal if you have access.

    Mount Using Storage Key:

    
    storage_account_name = "your_storage_account_name"
    
    container_name = "your_container_name"
    
    storage_account_key = "your_storage_account_key"
    
    dbutils.fs.mount(
    
        source=f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net/",
    
        mount_point="/mnt/your-mount-point",
    
        extra_configs={f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": storage_account_key}
    
    )
    
    

    For certain cases, you may need to request your administrator assistance, but these alternatives often provide adequate access without needing app registration.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Vinodh247 21,881 Reputation points
    2024-10-19T11:45:10.4766667+00:00

    Hi Zhilin Song,

    Thanks for reaching out to Microsoft Q&A.

    If you do not have access to app registration and cannot create a service principal for authentication, you can still connect Databricks to your Azure Storage account using other methods, depending on your permissions and setup. Here are some alternatives:

    1. Access Keys:
    • If you have access to the storage account access keys, you can use them directly in Databricks to authenticate.
    • You can mount the storage account in Databricks using the access key as given in the example below. This method does not require app registration but does require access to the storage account keys.

      dbutils.fs.mount( source = "wasbs://<container>@<storage_account>.blob.core.windows.net/", mount_point = "/mnt/<mount_name>", extra_configs = {"fs.azure.account.key.<storage_account>.blob.core.windows.net": "<storage_account_access_key>"} )

    1. SAS Keys:
    • If you cannot use access keys, your administrator can generate a SAS for the storage account, granting limited-time access to the specific resources you need. SAS tokens provide more granular control and do not require app reg.

      dbutils.fs.mount( source = "wasbs://<container>@<storage_account>.blob.core.windows.net/?<sas_token>", mount_point = "/mnt/<mount_name>" )

    1. Azure Managed Identity (If available):
    • If your Databricks workspace is running on a cluster with MI enabled, you can authenticate using the MI without needing app registration.
    • In this case, ensure that the MI is assigned the correct permissions to the Storage Account, such as the Storage Blob Data Contributor role.

      dbutils.fs.mount( source = "abfss://<container>@<storage_account>.dfs.core.windows.net/", mount_point = "/mnt/<mount_name>", extra_configs = { "fs.azure.account.auth.type.<storage_account>.dfs.core.windows.net": "ManagedIdentity", "fs.azure.account.oauth2.msi.service.principal.id": "<managed_identity_client_id>" } )

    If none of the above methods are available or appropriate for your environment, you will need to work with your administrator. They can either...

    • Provide access to the necessary app registration or service principal.
    • Set up access keys or SAS tokens for you.
    • Enable and configure MI for your Databricks environment.

    In summary, if you don't have access to app registration, your administrator can help set up access via Managed Identity, Access Keys, or SAS tokens, or directly assign the necessary permissions.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.