Read a configuration json file from Azure Blob Storage in Databricks Notebook

Mihai Cosmin 0 Reputation points
2023-08-17T14:26:37.9666667+00:00

Hello,

I am trying to read a configuration json file from Azure Blob Storage(conf.json) that is located for example: abfss://somewhere@storagetemp.dfs.core.windows.net/folder:

{ "task_name": "zzzzzzzzz",
  "source_path": "abfss://test@storagetemp.dfs.core.windows.net/",
  "source_objects": [
    "xxxxxxx_*.csv"
	],
  "schema_path": "xxx.json"
}

I am using Databricks and in the end I expect to read this json file(in fact is a dictionary) to have access to all the configuration detailed there and to use these values in a dynamic way in notebooks for doing different things.

For example I need to access the key "schema_path" to get the path to the schema file, to access that file and to get the schema that I will use it when I read a dataframe.

I found a solution to connect to azure blob storage and to read the file:

blob_service_client = BlobServiceClient(account_url=f"https://{accountname}.blob.core.windows.net", credential=accountkey, connection_timeout=120)

container_client = blob_service_client.get_container_client(container_name)

blob_client = container_client.get_blob_client(blob_name)

blob_download = blob_client.download_blob()

blob_content = blob_download.readall().decode("utf-8")

parsed_json = json.loads(blob_content)

but the credential(key from the Access keys ) I need to put it there in CLEAR.

Any other solutions than this one?

Any ideea? Because I need to read also the file for schema as I already said and I would like to do it in a more reusable way.

Thank you.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,875 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,181 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 89,816 Reputation points Microsoft Employee
    2023-08-22T10:29:23.8933333+00:00

    @Mihai Cosmin - Thanks for the question and using MS Q&A platform.

    es, there is a more secure way to access the Azure Blob Storage and read the configuration JSON file in Databricks Notebook without putting the credential key in clear.

    You can use Azure Key Vault to store the credential key securely and then access it in your Databricks Notebook. Here are the steps to do that:

    Create an Azure Key Vault and store the credential key as a secret in the Key Vault. You can follow the steps in the Azure documentation to create a Key Vault and store a secret: https://docs.microsoft.com/en-us/azure/key-vault/secrets/quick-create-portal

    1. Grant the Databricks service principal access to the Key Vault.
    2. In your Databricks Notebook, you can use the Azure Key Vault backed secret scope to access the credential key securely. You can follow the steps in the Databricks documentation to create a secret scope and access the secret: https://learn.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes
    3. Once you have access to the credential key, you can use it to authenticate to the Azure Blob Storage and read the configuration JSON file. Here is an example code snippet:
    from azure.identity import DefaultAzureCredential
    from azure.storage.blob import BlobServiceClient
    
    credential = DefaultAzureCredential()
    
    blob_service_client = BlobServiceClient(account_url=f"https://{accountname}.blob.core.windows.net", credential=credential, connection_timeout=120)
    
    container_client = blob_service_client.get_container_client(container_name)
    
    blob_client = container_client.get_blob_client(blob_name)
    
    blob_download = blob_client.download_blob()
    
    blob_content = blob_download.readall().decode("utf-8")
    
    parsed_json = json.loads(blob_content)
    

    In this code snippet, the DefaultAzureCredential class is used to authenticate to the Azure Blob Storage using the credential key stored in the Azure Key Vault backed secret scope.

    For more details, refer to https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.