Thanks for using MS Q&A platform and posting your query
The issue you’re facing is due to the Azure Storage Blob library not being installed in the environment where your Azure Data Factory (ADF) pipeline is running. When you run the notebook in Databricks, it has access to its own environment where the library is installed. However, when the ADF pipeline runs the notebook, it uses a different environment where the library might not be installed.
To avoid including the installation step in your script each time, you can ensure that the Azure Storage Blob library is installed in the environment where your ADF pipeline is running. Here are a few ways to do this:
Install the library in your ADF environment: If you have access to the environment where your ADF pipeline is running, you can install the library there. This would be similar to how you installed it in your Databricks notebook but instead, you would do it in your ADF environment.
Use a custom Docker image: If your ADF pipeline runs in a Docker container, you can create a custom Docker image that has the library pre-installed. Then, you can use this image to run your pipeline.
Include a requirements.txt file: If your ADF pipeline supports it, you can include a requirements.txt
file in your project. This file should list all the Python libraries that your project needs. When your pipeline runs, it will install all the libraries listed in this file.
Remember to replace the installation line in your script with an import statement:
import azure.storage.blob
I hope this helps! Let us know if you have any other questions