SAS token generation by Databricks to access CSV files from ADLS container folder

Subhadip Roy 31 Reputation points
2024-07-17T05:08:24.3033333+00:00

Hi Team,

There are some csv files zips inside the ADLS container folder. These zip files need to be downloaded for data correction.

Downloading the file requires SAS token embedded with zip file path. Databricks has been used to generate the token and pass it to ADF which then get passed to Logic App and the link is provided in the email.

Unfortunately the token is not generated as expected and its length is also 46 whereas the token generated from portal is of 48 character. while accessing the file using the token it is failing authentication not working , header not correct mismatch issue.

generate_blob_sas method has been used to generate the token.

Could you please suggest provide some suggestion/ guidance/assiatance.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,480 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,214 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,823 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Nehruji R 8,146 Reputation points Microsoft Vendor
    2024-07-18T07:29:51.9533333+00:00

    Hello Subhadip Roy,

    Greetings! Welcome to Microsoft Q&A Platform.

    I understand that you are encountering issues with the SAS token generation process in Databricks and getting error with authentication and header mismatch.

    Azure Storage supports creating a shared access signature (SAS) at the level of the ADLS gen 2 storage account. Using this, you can delegate access to write and delete operations for containers, queues, tables, and file shares, which are not available with an object-specific SAS.

    Verify the duration for which the SAS token is valid. Sometimes, setting an incorrect duration can cause issues. The length of the SAS token can vary based on the permissions and the parameters set. Ensure that all required parameters are included when generating the token and make sure that the correct storage account key is being used to sign the SAS token. This key is crucial for generating a valid token and compare the token generated from the Azure portal with the one generated by your code. Look for any discrepancies in the parameters or the format.

    Debugging: Use logging to capture the exact token being generated and compare it with a working token from the portal. This can help identify any missing or incorrect parameters.

    Yes, you can use Managed Identity (MI) to generate and access SAS tokens for Azure Data Lake Storage (ADLS).With Managed Identity, you can generate a User Delegation SAS. This involves obtaining a user delegation key from Azure AD and then using it to create the SAS token. This method is secure and leverages Azure AD for authentication.

    Steps to Generate User Delegation SAS:

    Obtain User Delegation Key: Use the get_user_delegation_key method to get the user delegation key.

    Generate SAS Token: Use the BlobSasBuilder and BlobUriBuilder helpers to generate the SAS token URI.

    Ensure that the Managed Identity used by ADF has the necessary permissions (e.g., Storage Blob Data Contributor) on the ADLS account.

    Use the generated SAS token in your ADF pipeline to access the blobs.

    refer - https://learn.microsoft.com/en-us/azure/databricks/connect/storage/azure-storage#--connect-to-azure-data-lake-storage-gen2-or-blob-storage-using-azure-credentials,

    https://learn.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature#types-of-shared-access-signatures,

    https://learn.microsoft.com/en-us/azure/storage/blobs/sas-service-create-python,

    https://learn.microsoft.com/en-us/azure/ai-services/language-service/native-document-support/shared-access-signatures.

    Please consider below following to troubleshoot the issue,

    Check the cluster configuration: Take a look at the settings for the cluster in question. Ensure that your user account, or the one you're intending to use, has the proper permissions set up.

    Verify your user identity: Double-check that you are indeed logged in with the user account that's supposed to have access. Sometimes, it might just be a simple case of being logged in with the wrong credentials.

    Roles and permissions: Databricks uses role-based access control (RBAC), so you might need to check that your role includes the permissions required to move data.

    Token issues: If you’re using a token for authentication, verify that it’s still valid and hasn’t expired.

    Network Policies: Sometimes, network policies can restrict access to certain users. It’s worth confirming that there aren't any such policies blocking your access.

    To find more detailed guidance on setting up permissions and troubleshooting these kinds of issues, Microsoft's documentation is a great resource. You might specifically want to look into the Databricks documentation on Microsoft Learn, which provides comprehensive guides and best practices for managing Databricks clusters and access control https://kb.databricks.com/en_US/all-articles.

    Hope this information helps! please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

  2. Subhadip Roy 31 Reputation points
    2024-07-18T10:49:39.1066667+00:00

    Hi ,

    Used the below method to create SAS token but it didn't worked

    generate_blob_sas(account_url=account_url,
    account_key=account_key,
    container_name=container_name,
    blob_name=file_name,
    account_name=account_name,
    permission=sas_token_permissions,
    expiry=expiry_time)
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.