Hi Ramachandran, Kannan, Thank you for posting query in Microsoft Q&A Platform. You can consider either of below approaches.
Make use of HTTP connector in Pipelines in copy activity, to download file from API to storage account directly. Check below video for high level idea.> Download file from API and load it in to Sink using Azure Data Factory
Or
You can consider using ADLS gen2 python SDKs, and
request
library from python to achieve same. Below is the sample code.
# Import necessary libraries
from pyspark.sql import SparkSession
from azure.storage.filedatalake import DataLakeFileClient
import requests
import io
import zipfile
# Initialize Spark session
spark = SparkSession.builder.appName("API to ADLS Gen2").getOrCreate()
# API URL (replace with your actual API endpoint)
api_url = "https://api.example.com/data.zip"
# Read data from the API (assuming it's a zip file)
response = requests.get(api_url)
zip_data = io.BytesIO(response.content)
# Extract the zip file
with zipfile.ZipFile(zip_data, "r") as zip_ref:
# Assuming there's a single file in the zip (modify as needed)
file_name = zip_ref.namelist()[0]
file_content = zip_ref.read(file_name)
# Write data to ADLS Gen2
adls_account_name = "<your-adls-account-name>"
adls_container_name = "<your-container-name>"
adls_file_path = f"abfss://{adls_container_name}@{adls_account_name}.dfs.core.windows.net/<path-to-folder>/{file_name}"
# Create a DataLakeFileClient object
file_client = DataLakeFileClient.from_connection_string(
conn_str="<your-connection-string>",
file_system_name=adls_container_name,
file_path=f"/<path-to-folder>/{file_name}"
)
# Write the data to ADLS Gen2
with file_client.create_file() as f:
f.write(file_content)
# Stop Spark session
spark.stop()
print(f"File '{file_name}' downloaded from API and written to ADLS Gen2 successfully!")
Hope this helps. Please let me know how it goes.
Please consider hitting Accept Answer
button. Accepted answers help community as well.