Hello Wei Zhang,
Thank you for posting your query here!
The ClientAuthenticationError
you’re encountering typically indicates an issue with the connection string or the way it’s being used to authenticate your requests. Please check the following:
· Ensure that the connection string is correct. It should be in the format DefaultEndpointsProtocol=https;AccountName=<your-account-name>;AccountKey=<your-account-key>;EndpointSuffix=core.windows.net
. Please replace <your-account-name>
and <your-account-key>
with your actual Azure Storage account name and key.
· The blob path should be the exact path of the blob in the container. For example, if your blob is in a folder named source
in your container, the blob path should be source/yourfile.parquet
. It should not start with ./
.
· When you download the blob, you need to read the content into the BytesIO stream. Here’s how you can do it:
stream_downloader = blob_client.download_blob()
stream = BytesIO()
stream.write(stream_downloader.readall())
stream.seek(0)
df = pd.read_parquet(stream, engine='pyarrow')
In the above code, stream.write(stream_downloader.readall()) writes the downloaded blob content into the stream, and stream.seek(0) resets the stream position to the beginning so that pd.read_parquet can read from it
https://stackoverflow.com/questions/75695414/read-parquet-folder-from-blob-storage
Do let us know if you have any further queries. I’m happy to assist you further.
Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.