Download a blob with Python
This article shows how to download a blob using the Azure Storage client library for Python. You can download blob data to various destinations, including a local file path, stream, or text string. You can also open a blob stream and read from it.
To learn about downloading blobs using asynchronous APIs, see Download blobs asynchronously.
Prerequisites
- Azure subscription - create one for free
- Azure storage account - create a storage account
- Python 3.8+
Set up your environment
If you don't have an existing project, this section shows you how to set up a project to work with the Azure Blob Storage client library for Python. For more details, see Get started with Azure Blob Storage and Python.
To work with the code examples in this article, follow these steps to set up your project.
Install packages
Install the following packages using pip install
:
pip install azure-storage-blob azure-identity
Add import statements
Add the following import
statements:
import io
import os
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient, BlobClient
Authorization
The authorization mechanism must have the necessary permissions to perform a download operation. For authorization with Microsoft Entra ID (recommended), you need Azure RBAC built-in role Storage Blob Data Reader or higher. To learn more, see the authorization guidance for Get Blob (REST API).
Create a client object
To connect an app to Blob Storage, create an instance of BlobServiceClient. The following example shows how to create a client object using DefaultAzureCredential
for authorization:
# TODO: Replace <storage-account-name> with your actual storage account name
account_url = "https://<storage-account-name>.blob.core.windows.net"
credential = DefaultAzureCredential()
# Create the BlobServiceClient object
blob_service_client = BlobServiceClient(account_url, credential=credential)
You can also create client objects for specific containers or blobs, either directly or from the BlobServiceClient
object. To learn more about creating and managing client objects, see Create and manage client objects that interact with data resources.
Download a blob
You can use the following method to download a blob:
The download_blob
method returns a StorageStreamDownloader object. During a download, the client libraries split the download request into chunks, where each chunk is downloaded with a separate Get Blob range request. This behavior depends on the total size of the blob and how the data transfer options are set.
Download to a file path
The following example downloads a blob to a file path:
def download_blob_to_file(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
with open(file=os.path.join(r'filepath', 'filename'), mode="wb") as sample_blob:
download_stream = blob_client.download_blob()
sample_blob.write(download_stream.readall())
Download to a stream
The following example downloads a blob to a stream. In this example, StorageStreamDownloader.read_into downloads the blob contents to a stream and returns the number of bytes read:
def download_blob_to_stream(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# readinto() downloads the blob contents to a stream and returns the number of bytes read
stream = io.BytesIO()
num_bytes = blob_client.download_blob().readinto(stream)
print(f"Number of bytes: {num_bytes}")
Download a blob in chunks
The following example downloads a blob and iterates over chunks in the download stream. In this example, StorageStreamDownloader.chunks returns an iterator, which allows you to read the blob content in chunks:
def download_blob_chunks(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# This returns a StorageStreamDownloader
stream = blob_client.download_blob()
chunk_list = []
# Read data in chunks to avoid loading all into memory at once
for chunk in stream.chunks():
# Process your data (anything can be done here - 'chunk' is a byte array)
chunk_list.append(chunk)
Download to a string
The following example downloads blob contents as text. In this example, the encoding
parameter is necessary for readall()
to return a string, otherwise it returns bytes:
def download_blob_to_string(self, blob_service_client: BlobServiceClient, container_name):
blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt")
# encoding param is necessary for readall() to return str, otherwise it returns bytes
downloader = blob_client.download_blob(max_concurrency=1, encoding='UTF-8')
blob_text = downloader.readall()
print(f"Blob contents: {blob_text}")
Download a block blob with configuration options
You can define client library configuration options when downloading a blob. These options can be tuned to improve performance and enhance reliability. The following code examples show how to define configuration options for a download both at the method level, and at the client level when instantiating BlobClient. These options can also be configured for a ContainerClient instance or a BlobServiceClient instance.
Specify data transfer options on download
You can set configuration options when instantiating a client to optimize performance for data transfer operations. You can pass the following keyword arguments when constructing a client object in Python:
max_chunk_get_size
- The maximum chunk size used for downloading a blob. Defaults to 4 MiB.max_single_get_size
- The maximum size for a blob to be downloaded in a single call. If the total blob size exceedsmax_single_get_size
, the remainder of the blob data is downloaded in chunks. Defaults to 32 MiB.
For download operations, you can also pass the max_concurrency
argument when calling download_blob. This argument defines the maximum number of parallel connections for the download operation.
The following code example shows how to specify data transfer options when creating a BlobClient
object, and how to download data using that client object. The values provided in this sample aren't intended to be a recommendation. To properly tune these values, you need to consider the specific needs of your app.
def download_blob_transfer_options(self, account_url: str, container_name: str, blob_name: str):
# Create a BlobClient object with data transfer options for download
blob_client = BlobClient(
account_url=account_url,
container_name=container_name,
blob_name=blob_name,
credential=DefaultAzureCredential(),
max_single_get_size=1024*1024*32, # 32 MiB
max_chunk_get_size=1024*1024*4 # 4 MiB
)
with open(file=os.path.join(r'file_path', 'file_name'), mode="wb") as sample_blob:
download_stream = blob_client.download_blob(max_concurrency=2)
sample_blob.write(download_stream.readall())
Download blobs asynchronously
The Azure Blob Storage client library for Python supports downloading blobs asynchronously. To learn more about project setup requirements, see Asynchronous programming.
Follow these steps to download a blob using asynchronous APIs:
Add the following import statements:
import asyncio from azure.identity.aio import DefaultAzureCredential from azure.storage.blob.aio import BlobServiceClient, BlobClient
Add code to run the program using
asyncio.run
. This function runs the passed coroutine,main()
in our example, and manages theasyncio
event loop. Coroutines are declared with the async/await syntax. In this example, themain()
coroutine first creates the top levelBlobServiceClient
usingasync with
, then calls the method that downloads the blob. Note that only the top level client needs to useasync with
, as other clients created from it share the same connection pool.async def main(): sample = BlobSamples() # TODO: Replace <storage-account-name> with your actual storage account name account_url = "https://<storage-account-name>.blob.core.windows.net" credential = DefaultAzureCredential() async with BlobServiceClient(account_url, credential=credential) as blob_service_client: await sample.download_blob_to_file(blob_service_client, "sample-container") if __name__ == '__main__': asyncio.run(main())
Add code to download the blob. The following example downloads a blob to a local file path using a
BlobClient
object. The code is the same as the synchronous example, except that the method is declared with theasync
keyword and theawait
keyword is used when calling thedownload_blob
method.async def download_blob_to_file(self, blob_service_client: BlobServiceClient, container_name): blob_client = blob_service_client.get_blob_client(container=container_name, blob="sample-blob.txt") with open(file=os.path.join(r'filepath', 'filename'), mode="wb") as sample_blob: download_stream = await blob_client.download_blob() data = await download_stream.readall() sample_blob.write(data)
With this basic setup in place, you can implement other examples in this article as coroutines using async/await syntax.
Resources
To learn more about how to download blobs using the Azure Blob Storage client library for Python, see the following resources.
Code samples
- View synchronous or asynchronous code samples from this article (GitHub)
REST API operations
The Azure SDK for Python contains libraries that build on top of the Azure REST API, allowing you to interact with REST API operations through familiar Python paradigms. The client library methods for downloading blobs use the following REST API operation:
- Get Blob (REST API)
Client library resources
Related content
- This article is part of the Blob Storage developer guide for Python. To learn more, see the full list of developer guide articles at Build your Python app.