How to read Azure Data Lake Storage Gen2 parquet files from the outside of Azure with Python?

Question

How to read Azure Data Lake Storage Gen2 parquet files from the outside of Azure with Python?

Justina 21

I would like to read Azure Data Lake Storage Gen2 parquet files from the outside of Azure Synapse Analytics. I would like to connect to the data from my local computer or Azure ML Studio Compute Instance. Is it possible? How to do it?

PRADEEPCHEEKATLA 90,661 Reputation points Moderator

2022-11-14T09:25:26.14+00:00
Hello @Justina ,

Following up to see if the below suggestion was helpful. And, if you have any further query do let us know.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.

Accepted answer

0 additional answers

Your answer

PRADEEPCHEEKATLA 90,661 Reputation points Moderator

2022-11-14T09:25:26.14+00:00

Hello @Justina ,

Following up to see if the below suggestion was helpful. And, if you have any further query do let us know.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.

Answer 1

Hello @Justina ,

Thanks for the question and using MS Q&A platform.

Use pyarrowfs-adlgen2 is an implementation of a pyarrow filesystem for Azure Data Lake Gen2.

Note: It allows you to use pyarrow and pandas to read parquet datasets directly from Azure without the need to copy files to local storage first.

And also checkout the Reading a Parquet File from Azure Blob storage of the document Reading and Writing the Apache Parquet Format of pyarrow, manually to list the blob names with the prefix like dataset_name using the API list_blob_names(container_name, prefix=None, num_results=None, include=None, delimiter=None, marker=None, timeout=None) of Azure Storgae SDK for Python as the figure below, then to read these blobs one by one.

For more details, refer to the below threads addressing similar issue:

Unforunately, you cannot connect data from Local Computer to Azure Synapse Analytics. You need to first transfer data to Azure Data Lake Gen2 and the perform any transformations.

Note: Yes, you can connect Attach and manage a Synapse Spark pool in Azure Machine Learning (preview) which is currently in preview.

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators

Share via

How to read Azure Data Lake Storage Gen2 parquet files from the outside of Azure with Python?

0 additional answers

Your answer