Process ADLS gen2 Files(csv or json ) using python (smaller files)

Prakash14 121 Reputation points
2022-07-09T10:11:04.36+00:00

I could find the Azure documentation to create directory, upload , download files from the for ADLS gen2 data lake using python.

Is there any Adlsfilesystem python library which can be used to read the files directly from "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/inputpath/employee.csv " using python without using databricks ?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,338 questions
Azure Data Lake Analytics
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 76,836 Reputation points Microsoft Employee
    2022-07-11T07:42:45.597+00:00

    Hello @PrakashP-0042,

    Thanks for the question and using MS Q&A platform.

    Azure Data Lake Gen2 allows you to manage and access data just as you would with a Hadoop Distributed File System (HDFS). The new ABFS driver (used to access data) is available within all Apache Hadoop environments. These environments include Azure HDInsight, Azure Databricks, and Azure Synapse Analytics.

    Note: It's recommended to use Azure HDInsight, Databricks and Synapse Analytics to process and analyze data ADLS Gen2.

    The following table recommends tools that you can use to ingest, analyze, visualize, and download data. Use the links in this table to find guidance about how to configure and use each tool.

    219377-image.png

    For more details, refer to Best practices for using Azure Data Lake Storage Gen2.

    If you still want to access the data without using Azure Databricks, you may checkout similar SO thread - Azure ADLS Gen2 File read using Python (without ADB).

    import os, uuid, sys  
    from azure.storage.filedatalake import DataLakeServiceClient  
      
    service_client = DataLakeServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=***;AccountKey=*****;EndpointSuffix=core.windows.net")  
      
    file_system_client = service_client.get_file_system_client(file_system="test")  
      
    directory_client = file_system_client.get_directory_client("testdirectory")  
      
    file_client = directory_client.get_file_client("test.txt")  
      
    download=file_client.download_file()  
      
    downloaded_bytes = download.readall()  
      
    with open("./sample.txt", "wb") as my_file:  
        my_file.write(downloaded_bytes)  
        my_file.close()  
    

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful