Encountering errors when attempting to write files from Azure Synapse Analytics notebook to Azure Data Lake Storage (ADLS) Gen2, while having a private endpoint configured with the DFS URL (OSError: [Errno 5] Input/output error)

Ramachandran, Kannan 0 Reputation points
2024-02-23T15:43:25.0566667+00:00

 

  1. Issue: Encountering errors when attempting to write files from Azure Synapse Analytics notebook to Azure Data Lake Storage (ADLS) Gen2, while having a private endpoint configured with the DFS URL.  
  2. Requirement: Files need to be downloaded from a website using the requests module's get method, and then the binary file content needs to be written to the primary ADLS Gen2 storage, which serves as the primary storage for Synapse Analytics.  
  3. Error Message:
  1. When I mount the adls gen2 with synapse and try copying it gives  (OSError: [Errno 5] Input/output error:) User's image

User's image

 4. Setup: ADLS Gen2 and Synapse Analytics are connected via a private endpoint with the DFS URL. Attempts to write files to ADLS Gen2 have been made by mounting the ADLS Gen2 storage.  

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,466 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,903 questions
Azure Private Link
Azure Private Link
An Azure service that provides private connectivity from a virtual network to Azure platform as a service, customer-owned, or Microsoft partner services.
504 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,441 Reputation points Microsoft Employee
    2024-02-26T06:25:29.27+00:00

    Hi Ramachandran, Kannan, Thank you for posting query in Microsoft Q&A Platform. You can consider either of below approaches.

    Make use of HTTP connector in Pipelines in copy activity, to download file from API to storage account directly. Check below video for high level idea.> Download file from API and load it in to Sink using Azure Data Factory

    Or

    You can consider using ADLS gen2 python SDKs, and request library from python to achieve same. Below is the sample code.

    # Import necessary libraries
    from pyspark.sql import SparkSession
    from azure.storage.filedatalake import DataLakeFileClient
    import requests
    import io
    import zipfile
    
    # Initialize Spark session
    spark = SparkSession.builder.appName("API to ADLS Gen2").getOrCreate()
    
    # API URL (replace with your actual API endpoint)
    api_url = "https://api.example.com/data.zip"
    
    # Read data from the API (assuming it's a zip file)
    response = requests.get(api_url)
    zip_data = io.BytesIO(response.content)
    
    # Extract the zip file
    with zipfile.ZipFile(zip_data, "r") as zip_ref:
        # Assuming there's a single file in the zip (modify as needed)
        file_name = zip_ref.namelist()[0]
        file_content = zip_ref.read(file_name)
    
    # Write data to ADLS Gen2
    adls_account_name = "<your-adls-account-name>"
    adls_container_name = "<your-container-name>"
    adls_file_path = f"abfss://{adls_container_name}@{adls_account_name}.dfs.core.windows.net/<path-to-folder>/{file_name}"
    
    # Create a DataLakeFileClient object
    file_client = DataLakeFileClient.from_connection_string(
        conn_str="<your-connection-string>",
        file_system_name=adls_container_name,
        file_path=f"/<path-to-folder>/{file_name}"
    )
    
    # Write the data to ADLS Gen2
    with file_client.create_file() as f:
        f.write(file_content)
    
    # Stop Spark session
    spark.stop()
    
    print(f"File '{file_name}' downloaded from API and written to ADLS Gen2 successfully!")
    
    

    Hope this helps. Please let me know how it goes.


    Please consider hitting Accept Answer button. Accepted answers help community as well.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.