How to fix ClientAuthenticationError when accessing the parquet (PyArrow) file through Blob Storage?

Wei Zhang 40 Reputation points
2024-03-20T20:48:11.91+00:00

HI everyone

When I tried to build a blob client to access one .parquet file in Blob datastore with following script:

import os  
from azure.storage.blob import   BlobServiceClient  
import pandas as pd  
from io import BytesIO  

connection_str = "*****"
container = "*****"
blob_path = "./source/****.parquet"

blob_service_client = BlobServiceClient.from_connection_string(connection_str)
blob_client = blob_service_client.get_blob_client(container = container, blob = blob_path) 
stream_downloader = blob_client.download_blob()  #error happened here

stream = BytesIO() 
df = pd.read_parquet(stream, engine = 'pyarrow') 


It showed the following error message for this step:

blob_client.download_blob()

ClientAuthenticationError: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:* Time:2024-03-20T20:40:44.8529867Z ErrorCode:AuthenticationFailed authenticationerrordetail:The MAC signature found in the HTTP request '****' is not the same as any computed signature. Server used following string to sign: 'GET

Could someone have a clue about this? Thank you

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,046 questions
0 comments No comments
{count} votes

Accepted answer
  1. Anand Prakash Yadav 7,815 Reputation points Microsoft Vendor
    2024-03-21T09:23:08.44+00:00

    Hello Wei Zhang,

    Thank you for posting your query here!

    The ClientAuthenticationError you’re encountering typically indicates an issue with the connection string or the way it’s being used to authenticate your requests. Please check the following:

    · Ensure that the connection string is correct. It should be in the format DefaultEndpointsProtocol=https;AccountName=<your-account-name>;AccountKey=<your-account-key>;EndpointSuffix=core.windows.net. Please replace <your-account-name> and <your-account-key> with your actual Azure Storage account name and key.

    · The blob path should be the exact path of the blob in the container. For example, if your blob is in a folder named source in your container, the blob path should be source/yourfile.parquet. It should not start with ./.

    · When you download the blob, you need to read the content into the BytesIO stream. Here’s how you can do it:

    stream_downloader = blob_client.download_blob()
    stream = BytesIO()
    stream.write(stream_downloader.readall())
    stream.seek(0)
    df = pd.read_parquet(stream, engine='pyarrow')
    

    In the above code, stream.write(stream_downloader.readall()) writes the downloaded blob content into the stream, and stream.seek(0) resets the stream position to the beginning so that pd.read_parquet can read from it

    Reference: https://stackoverflow.com/questions/63351478/how-to-read-parquet-files-from-azure-blobs-into-pandas-dataframe

    https://stackoverflow.com/questions/75695414/read-parquet-folder-from-blob-storage

    Do let us know if you have any further queries. I’m happy to assist you further.

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.