Run Databricks notebook from ADF - error to find azure module to save the data in blob storage

Sarmistha Sarkar 0 Reputation points
2024-06-12T12:39:10.24+00:00

Hi Guys,

The requirement is - Call Rest API, read the records in jsonlines format and load into table in Azure SQL server.

I used Databricks to read the jsonlines from Open API using Python script. It can read and keep the data into a file in Azure blob storage.

But when I am executing that notebook from ADF pipeline, it is failing with error - 'module not found': azure

if I include the azure module installation step in the notebook then it is executing from ADF pipeline.

Can you please tell me how I can avoid to include the azure module installation step in my notebook script to run every time.

Below is my script in Databricks notebook in python -

===========================

import requests

%pip install azure-storage-blob ///want to avoid to execute that installation in script each time of execution

dbutils.library.restartPython()

from azure.storage.blob import BlobServiceClient

Make the API call to get the data in jsonlines format

response = requests.get('XXXXXXXX'),

headers={'X-API-Key': 'XXXXXXX','Content-Type': 'application/jsonlines'})

#Get the data from the response

data = response.text

#Create a BlobServiceClient to connect to Azure Blob Storage

blob_service_client = BlobServiceClient.from_connection_string('DefaultEndpointsProtocol=https;AccountName=<XXX>;AccountKey=<XXXX>;EndpointSuffix=core.windows.net')

#Check if the container exists

container_name = 'test'

container_client = blob_service_client.get_container_client(container_name)

#If the container does not exist, create it

if not container_client.exists():

container_client.create_container()

#Create a blob client to upload the data

blob_client = container_client.get_blob_client('generalriskiq')

#Upload the data to the blob

blob_client.upload_blob(data, overwrite=True)

Azure SQL Database
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,492 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,228 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 11,455 Reputation points Microsoft Vendor
    2024-06-13T05:37:40.67+00:00

    @Sarmistha Sarkar

    Thanks for using MS Q&A platform and posting your query

    The issue you’re facing is due to the Azure Storage Blob library not being installed in the environment where your Azure Data Factory (ADF) pipeline is running. When you run the notebook in Databricks, it has access to its own environment where the library is installed. However, when the ADF pipeline runs the notebook, it uses a different environment where the library might not be installed.

    To avoid including the installation step in your script each time, you can ensure that the Azure Storage Blob library is installed in the environment where your ADF pipeline is running. Here are a few ways to do this:

    Install the library in your ADF environment: If you have access to the environment where your ADF pipeline is running, you can install the library there. This would be similar to how you installed it in your Databricks notebook but instead, you would do it in your ADF environment.

    Use a custom Docker image: If your ADF pipeline runs in a Docker container, you can create a custom Docker image that has the library pre-installed. Then, you can use this image to run your pipeline.

    Include a requirements.txt file: If your ADF pipeline supports it, you can include a requirements.txt file in your project. This file should list all the Python libraries that your project needs. When your pipeline runs, it will install all the libraries listed in this file.

    Remember to replace the installation line in your script with an import statement:

    import azure.storage.blob
    
    

    I hope this helps! Let us know if you have any other questions

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.