Hello @Justina ,
Thanks for the question and using MS Q&A platform.
Use pyarrowfs-adlgen2 is an implementation of a pyarrow filesystem for Azure Data Lake Gen2.
Note: It allows you to use pyarrow and pandas to read parquet datasets directly from Azure without the need to copy files to local storage first.
And also checkout the Reading a Parquet File from Azure Blob storage of the document Reading and Writing the Apache Parquet Format
of pyarrow
, manually to list the blob names with the prefix like dataset_name
using the API list_blob_names(container_name, prefix=None, num_results=None, include=None, delimiter=None, marker=None, timeout=None) of Azure Storgae SDK for Python as the figure below, then to read these blobs one by one.
For more details, refer to the below threads addressing similar issue:
- How to download all partitions of a parquet file in Python from Azure Data Lake?
- How to read parquet files directly from azure datalake without spark?
Unforunately, you cannot connect data from Local Computer to Azure Synapse Analytics. You need to first transfer data to Azure Data Lake Gen2 and the perform any transformations.
Note: Yes, you can connect Attach and manage a Synapse Spark pool in Azure Machine Learning (preview) which is currently in preview.
Hope this will help. Please let us know if any further queries.
------------------------------
- Please don't forget to click on
or upvote
button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
- Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is jhow you can be part of Q&A Volunteer Moderators