How to read Azure Data Lake Storage Gen2 parquet files from the outside of Azure with Python?

Justina 21 Reputation points
2022-11-08T14:18:21.683+00:00

I would like to read Azure Data Lake Storage Gen2 parquet files from the outside of Azure Synapse Analytics. I would like to connect to the data from my local computer or Azure ML Studio Compute Instance. Is it possible? How to do it?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,348 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,395 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 77,751 Reputation points Microsoft Employee
    2022-11-09T06:42:38.993+00:00

    Hello @Justina ,

    Thanks for the question and using MS Q&A platform.

    Use pyarrowfs-adlgen2 is an implementation of a pyarrow filesystem for Azure Data Lake Gen2.

    Note: It allows you to use pyarrow and pandas to read parquet datasets directly from Azure without the need to copy files to local storage first.

    And also checkout the Reading a Parquet File from Azure Blob storage of the document Reading and Writing the Apache Parquet Format of pyarrow, manually to list the blob names with the prefix like dataset_name using the API list_blob_names(container_name, prefix=None, num_results=None, include=None, delimiter=None, marker=None, timeout=None) of Azure Storgae SDK for Python as the figure below, then to read these blobs one by one.

    For more details, refer to the below threads addressing similar issue:

    Unforunately, you cannot connect data from Local Computer to Azure Synapse Analytics. You need to first transfer data to Azure Data Lake Gen2 and the perform any transformations.

    Note: Yes, you can connect Attach and manage a Synapse Spark pool in Azure Machine Learning (preview) which is currently in preview.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful