Encountering errors when attempting to write files from Azure Synapse Analytics notebook to Azure Data Lake Storage (ADLS) Gen2, while having a private endpoint configured with the DFS URL (OSError: [Errno 5] Input/output error)

Question

Issue: Encountering errors when attempting to write files from Azure Synapse Analytics notebook to Azure Data Lake Storage (ADLS) Gen2, while having a private endpoint configured with the DFS URL.
Requirement: Files need to be downloaded from a website using the requests module's get method, and then the binary file content needs to be written to the primary ADLS Gen2 storage, which serves as the primary storage for Synapse Analytics.
Error Message:

When I mount the adls gen2 with synapse and try copying it gives (OSError: [Errno 5] Input/output error:)

User's image

4. Setup: ADLS Gen2 and Synapse Analytics are connected via a private endpoint with the DFS URL. Attempts to write files to ADLS Gen2 have been made by mounting the ADLS Gen2 storage.

Answer

Hi Ramachandran, Kannan, Thank you for posting query in Microsoft Q&A Platform. You can consider either of below approaches.

Make use of HTTP connector in Pipelines in copy activity, to download file from API to storage account directly. Check below video for high level idea.> Download file from API and load it in to Sink using Azure Data Factory

Or

You can consider using ADLS gen2 python SDKs, and request library from python to achieve same. Below is the sample code.

# Import necessary libraries
from pyspark.sql import SparkSession
from azure.storage.filedatalake import DataLakeFileClient
import requests
import io
import zipfile

# Initialize Spark session
spark = SparkSession.builder.appName("API to ADLS Gen2").getOrCreate()

# API URL (replace with your actual API endpoint)
api_url = "https://api.example.com/data.zip"

# Read data from the API (assuming it's a zip file)
response = requests.get(api_url)
zip_data = io.BytesIO(response.content)

# Extract the zip file
with zipfile.ZipFile(zip_data, "r") as zip_ref:
    # Assuming there's a single file in the zip (modify as needed)
    file_name = zip_ref.namelist()[0]
    file_content = zip_ref.read(file_name)

# Write data to ADLS Gen2
adls_account_name = ""
adls_container_name = ""
adls_file_path = f"abfss://{adls_container_name}@{adls_account_name}.dfs.core.windows.net//{file_name}"

# Create a DataLakeFileClient object
file_client = DataLakeFileClient.from_connection_string(
    conn_str="",
    file_system_name=adls_container_name,
    file_path=f"//{file_name}"
)

# Write the data to ADLS Gen2
with file_client.create_file() as f:
    f.write(file_content)

# Stop Spark session
spark.stop()

print(f"File '{file_name}' downloaded from API and written to ADLS Gen2 successfully!")

Hope this helps. Please let me know how it goes.

Please consider hitting Accept Answer button. Accepted answers help community as well.

Share via

Encountering errors when attempting to write files from Azure Synapse Analytics notebook to Azure Data Lake Storage (ADLS) Gen2, while having a private endpoint configured with the DFS URL (OSError: [Errno 5] Input/output error)

1 answer

Your answer