How to read/write data from Azure filestorage/fileshare into Python script

HughA 0 Reputation points
2024-07-05T03:47:56.2133333+00:00

I'm trying to read/write joblib and csv files from Azure File Storage/File Share into a Python script. To check its working, I'll read/write from vs code running locally, and then finally run from a container in ACI. The code below works, but the file imported is a StorageStreamDownloader type, can this be converted into a Python object like joblib (for a joblib file) and Pandas DataFrame (for a csv file).

from azure.storage.fileshare import ShareFileClient
import pandas as pd

connection_string=f'DefaultEndpointsProtocol=https;AccountName=XXXX;AccountKey={key};EndpointSuffix=core.windows.net'
filename='test.csv'


file_client = ShareFileClient.from_connection_string(conn_str=connection_string, share_name="XXX", file_path=filename)
with open("DEST_FILE", "wb") as file_handle:
    data = file_client.download_file()
    data.readinto(file_handle)

df=pd.DataFrame(data)
ValueError
Azure Files
Azure Files
An Azure service that offers file shares in the cloud.
1,305 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,235 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Vinodh247 23,581 Reputation points MVP
    2024-07-05T06:54:20.3066667+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    You can convert the 'StorageStreamDownloader' object into a 'joblib' object or a Pandas df. For this, you need to read the data from the stream and then process it accordingly.

    Try the below modified script locally in vscode to check if it works, and then use it in your container in ACI.

    connection_string = 
    
    def download_file_to_memory(connection_string, share_name, filename):
        file_client = ShareFileClient.from_connection_string(conn_str=connection_string, share_name=share_name, file_path=filename)
        downloader = file_client.download_file()
        downloaded_bytes = downloader.readall()
        return downloaded_bytes
    
    # Example for CSV file
    csv_filename = 'test.csv'
    csv_data = download_file_to_memory(connection_string, share_name, csv_filename)
    
    # Read CSV data into a Pandas DataFrame
    csv_io = io.BytesIO(csv_data)
    df = pd.read_csv(csv_io)
    print(df)
    # Example for joblib file
    joblib_filename = 'model.joblib'
    joblib_data = download_file_to_memory(connection_string, share_name, joblib_filename)
    
    # Load joblib data into a Python object
    joblib_io = io.BytesIO(joblib_data)
    model = joblib.load(joblib_io)
    print(model)
    

    Note: the above is purely an example, you will have to modify/add codeblocks or libraries the script to suit your needs.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    0 comments No comments

  2. Vinodh247 23,581 Reputation points MVP
    2024-07-05T06:56:40.68+00:00

    Hi HughA,

    Thanks for reaching out to Microsoft Q&A.

    You can convert the 'StorageStreamDownloader' object into a 'joblib' object or a Pandas df. For this, you need to read the data from the stream and then process it accordingly.

    Try the below modified script locally in vscode to check if it works, and then use it in your container in ACI.

    connection_string = 
    
    def download_file_to_memory(connection_string, share_name, filename):
        file_client = ShareFileClient.from_connection_string(conn_str=connection_string, share_name=share_name, file_path=filename)
        downloader = file_client.download_file()
        downloaded_bytes = downloader.readall()
        return downloaded_bytes
    
    # Example for CSV file
    csv_filename = 'test.csv'
    csv_data = download_file_to_memory(connection_string, share_name, csv_filename)
    
    # Read CSV data into a Pandas DataFrame
    csv_io = io.BytesIO(csv_data)
    df = pd.read_csv(csv_io)
    print(df)
    # Example for joblib file
    joblib_filename = 'model.joblib'
    joblib_data = download_file_to_memory(connection_string, share_name, joblib_filename)
    
    # Load joblib data into a Python object
    joblib_io = io.BytesIO(joblib_data)
    model = joblib.load(joblib_io)
    print(model)
    

    Note: the above is purely an example, you will have to modify/add codeblocks or libraries the script to suit your needs.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    0 comments No comments

  3. Nehruji R 8,146 Reputation points Microsoft Vendor
    2024-07-05T08:22:44.6533333+00:00

    Hello HughA,

    Greetings! Welcome to Microsoft Q&A Platform.

    Yes, you can convert the StorageStreamDownloader object into a Python object like a joblib file or a Pandas DataFrame. You can use the joblib library to load the file from the StorageStreamDownloader object with sample script,

    Initially download the Blob as stream,

    import joblib
    from io import BytesIO
    # Read the stream into a BytesIO object
    stream = BytesIO()
    streamdownloader.readinto(stream)
    stream.seek(0)  # Reset the stream position to the beginning
    # Save the stream to a joblib file
    joblib.dump(stream, 'your_file_name.joblib')
    

    Try use the pandas library to read the CSV file from the StorageStreamDownloader object.

    Initially download the blob as stream

    from azure.storage.blob import BlobServiceClient
    from io import StringIO
    import pandas as pd
    
    # Initialize the BlobServiceClient
    blob_service_client = BlobServiceClient.from_connection_string("your_connection_string")
    container_client = blob_service_client.get_container_client("your_container_name")
    blob_client = container_client.get_blob_client("your_blob_name")
    
    # Download the blob as a StorageStreamDownloader object streamdownloader = blob_client.download_blob()
    
    

    converting into desired format,

    # Read the stream into a pandas DataFrame
    downloaded_blob = streamdownloader.readall()
    df = pd.read_csv(StringIO(downloaded_blob.decode('utf-8')))
    

    Note: Please modify the code as per your requirement as these are sample code.

    refer - https://learn.microsoft.com/en-us/python/api/azure-storage-blob/azure.storage.blob.storagestreamdownloader?view=azure-python

    Similar thread for reference - https://stackoverflow.com/questions/33091830/how-best-to-convert-from-azure-blob-csv-format-to-pandas-dataframe-while-running.

    Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.