how we unzip nested zip file stored in ADLS gen2 using python script

manish verma 441 Reputation points
2024-08-26T06:17:35.6133333+00:00

Hi All,

we are new in storage account Python SDK of blob storage. we are trying a workaround solution to unzip nested zip file that is stored in blob container. All zip files should be unzip and create a folder. We are new in blob storage SDK using python.

Is someone help, if we have any reference.

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,107 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,790 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sumarigo-MSFT 45,781 Reputation points Microsoft Employee
    2024-08-26T08:30:49.45+00:00

    @manish verma Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    I understand that you're looking for help with unzipping nested zip files stored in a blob container using the Python SDK for Azure Blob Storage. Here are a few steps and references that might help you get started:

    1. Azure Blob Storage SDK for Python: You can refer to the official documentation for the Azure Blob Storage SDK for Python. It provides detailed information on how to interact with blob storage, including uploading, downloading, and managing blobs. You can find the documentation An external link was removed to protect your privacy.
    2. Unzipping Nested Zip Files: To unzip nested zip files, you can use the zipfile module in Python. Here's a basic example to get you started:
    import zipfile
    import os
    from azure.storage.blob import BlobServiceClient
    
    # Initialize the BlobServiceClient
    blob_service_client = BlobServiceClient.from_connection_string("your_connection_string")
    container_name = "your_container_name"
    blob_name = "your_blob_name.zip"
    
    # Download the blob to a local file
    local_file_name = "downloaded_blob.zip"
    with open(local_file_name, "wb") as download_file:
        download_file.write(blob_service_client.get_blob_client(container_name, blob_name).download_blob().readall())
    
    # Function to unzip files
    def unzip_nested_zip(src_file, dest_dir):
        with zipfile.ZipFile(src_file, 'r') as zip_ref:
            zip_ref.extractall(dest_dir)
            for file in zip_ref.namelist():
                if file.endswith('.zip'):
                    unzip_nested_zip(os.path.join(dest_dir, file), os.path.join(dest_dir, os.path.splitext(file)))
    
    # Unzip the downloaded blob
    unzip_nested_zip(local_file_name, "unzipped_folder")
    
    
    

    Additional information:
    Unzip nested zip files in python

    Unzip file in blob storage with blob storage trigger

    Python's zipfile: Manipulate Your ZIP Files Efficiently

    from azure.storage.blob import BlobServiceClient
    import zipfile
    import io
    
    # Set the connection string and container name
    connection_string = "<your_connection_string>"
    container_name = "<your_container_name>"
    
    # Create a BlobServiceClient object
    blob_service_client = BlobServiceClient.from_connection_string(connection_string)
    
    # Get a reference to the container
    container_client = blob_service_client.get_container_client(container_name)
    
    # List all blobs in the container
    blobs = container_client.list_blobs()
    
    # Loop through each blob in the container
    for blob in blobs:
        # Check if the blob is a zip file
        if blob.name.endswith('.zip'):
            # Download the zip file to memory
            blob_client = container_client.get_blob_client(blob.name)
            zip_data = blob_client.download_blob().readall()
    
            # Extract the zip file to a folder
            with zipfile.ZipFile(io.BytesIO(zip_data)) as zip_file:
                for nested_zip in zip_file.namelist():
                    if nested_zip.endswith('.zip'):
                        with zipfile.ZipFile(io.BytesIO(zip_file.read(nested_zip))) as nested_zip_file:
                            nested_zip_file.extractall(nested_zip[:-4])
    
    

    Please let us know if you have any further queries. I’m happy to assist you further.    


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.