Hello Chahat Malik,
Greetings! Welcome to Microsoft Q&A Platform.
To create a zip file for multiple files stored in Azure Blob Storage without downloading them locally, you can use the azure-storage-blob library along with in-memory file handling. You can use Azure Databricks along with Azure Blob Storage to create a zip file for multiple files without reading them into local storage.
- First, mount your Azure Blob Storage to Databricks.
- Utilize the Azure Storage SDK for Python to interact with the blob storage.
Create a Zip File in Memory: Use the io.BytesIO module to create a zip file in memory.
Upload the Zip File: Finally, upload the zip file back to the blob storage.
Note: While mounting Blob Storage, Ensure your Azure Blob Storage is mounted to Databricks. In-Memory Operations: The io.BytesIO module allows you to handle the zip file in memory, avoiding local storage. The Azure Storage SDK for Python provides the necessary methods to interact with blob storage.
Sample code for reference,
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import io
import zipfile
# Initialize the BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string("your_connection_string")
# Specify the container and the list of blobs to be zipped
container_name = "your_container_name"
blob_names = ["file1.txt", "file2.txt", "file3.txt"]
# Create a BytesIO object to hold the zip file in memory
zip_buffer = io.BytesIO()
# Create a ZipFile object
with zipfile.ZipFile(zip_buffer, 'w') as zip_file:
for blob_name in blob_names:
# Get the blob client
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
# Download the blob content as bytes
blob_data = blob_client.download_blob().readall()
# Write the blob content to the zip file
zip_file.writestr(blob_name, blob_data)
# Seek to the beginning of the BytesIO object
zip_buffer.seek(0)
# Upload the zip file to blob storage
zip_blob_client = blob_service_client.get_blob_client(container=container_name, blob="zipped_files.zip")
zip_blob_client.upload_blob(zip_buffer.getvalue(), overwrite=True)
print("Zip file created and uploaded successfully.")
Similar ask thread for reference - https://stackoverflow.com/questions/18852389/generate-a-zip-file-from-azure-blob-storage-files, https://stackoverflow.com/questions/59713184/how-to-zip-files-on-azure-blob-storage-with-shutil-in-databricks.
Hope this information helps! Please let us know if you have any further queries. I’m happy to assist you further.
Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.