How to work with large pdf stored in azure blob storage

Question

How to work with large pdf stored in azure blob storage

Shaheda Ansari 0

We upload the large pdf to azure blob storage which has 6 to 10 thousands pages. After uploading we have to work on that pdf file and after working reupload the same file. For now we first download the file because we have work with that files but when we download the file and work with them there is issue related to memory. So can you please suggest is there any other way to upload pdf file page wise and in the end we will get the entire pdf. And in the same way for download.

2 answers

Your answer

Answer 1

Nandamuri Pranay Teja 3,700 Microsoft External Staff Moderator

Hello Shaheda,

I understand that you're looking for a way to upload and download large PDF files page by page to avoid memory issues.

Please be informed that Block blobs allow for the uploading of large files by dividing them into smaller segments. In your situation, you can upload the PDF file one page at a time, treating each page as a separate block. After all pages have been uploaded, you can finalize the blocks to create the complete PDF file. This method enables you to manage individual pages without the necessity of downloading the entire file simultaneously.

Memory management:

In a similar manner, while downloading the file, you have the ability to obtain the blocks associated with each page and subsequently piece them together to form the entire PDF document. This method can enhance memory management when handling large PDF files.

References:

https://stackoverflow.com/questions/49281802/pdfs-in-azure-blob-storage-better-block-or-page-blobs

Hope the above answer helps! Please let us know do you have any further queries.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Nandamuri Pranay Teja 3,700 Reputation points Microsoft External Staff Moderator

2025-03-28T09:57:20.8166667+00:00

Hello Shaheda,

I wanted to follow up and see if the given answer was helpful. If this resolves your issue, please click "Accept the answer" for it, which could help other community members who read this thread. And, if you have any more questions, please let us know.
Nandamuri Pranay Teja 3,700 Reputation points Microsoft External Staff Moderator

2025-03-31T07:35:24.26+00:00

Hello Shaheda,

Just checking in to see if the provided answer helped. If this answers your query, do click "Accept the answer” for the same, which might be beneficial to other community members reading this thread. And, if you have any further queries do let us know.

Answer 2

Hi @shaheda Ansari

So can you please suggest is there any other way to upload pdf file page wise and in the end we will get the entire pdf. And in the same way for download.

You can use the code below to upload an entire PDF file to azure blob storage. The output will show how many pages were uploaded using Python.

Code:(Upload)

import fitz  
from azure.storage.blob import BlobServiceClient

connection_string = "xxxxx"
container_name = "sxxx"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

def process_pdf_and_upload(pdf_path, blob_name):
    """Simulates page-wise processing and uploads only the full PDF at the end."""
    doc = fitz.open(pdf_path)
    total_pages = len(doc)
    
    # Simulating page uploads (without actually uploading)
    for i in range(total_pages):
        print(f"Uploading page {i+1} of {total_pages}...")

   
    with open(pdf_path, "rb") as pdf_file:
        pdf_bytes = pdf_file.read()

    blob_client = blob_service_client.get_blob_client(container_name, blob_name)
    blob_client.upload_blob(pdf_bytes, overwrite=True)
    
    print(f"Uploaded entire PDF ({total_pages} pages): {blob_name}")

process_pdf_and_upload("<name of pdf>.pdf", "final_large_pdf1.pdf")

Output:

Uploading page 164 of 168...
Uploading page 165 of 168...
Uploading page 166 of 168...
Uploading page 167 of 168...
Uploading page 168 of 168...
Uploaded entire PDF (168 pages): final_large_pdf1.pdf

In my environment, I have a 168-page PDF file that was successfully uploaded to my Azure Blob Storage.

enter image description here

Code:(Download)

import fitz  
from azure.storage.blob import BlobServiceClient

connection_string = "xxxxx"
container_name = "xxxxx"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

def download_and_process_pdf(blob_name, local_pdf_path):
    """Downloads a PDF from Azure Blob Storage and simulates page-wise processing."""
    blob_client = blob_service_client.get_blob_client(container_name, blob_name)
    
    # Download the blob
    pdf_data = blob_client.download_blob().readall()

    # Save locally
    with open(local_pdf_path, "wb") as file:
        file.write(pdf_data)

    print(f"Downloaded PDF: {blob_name}")

    doc = fitz.open(local_pdf_path)
    total_pages = len(doc)


    for i in range(total_pages):
        print(f"Processing page {i+1} of {total_pages}...")

download_and_process_pdf("final_large_pdf1.pdf", "<pdf name>.pdf")

Hope this answer helps! please let us know if you have any further queries. I’m happy to assist you further.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Shaheda Ansari 0 Reputation points

2025-04-08T13:31:39.6366667+00:00

We are using C# and the Azure Client SDK for uploading and downloading files, following a range-based pattern for both operations. Our PDF file contains around 6000 pages, and we need to download the file, perform operations on each page, and then upload the entire PDF again. Is there a way to append pages to the blob during upload, instead of uploading the whole file at once?

Share via

How to work with large pdf stored in azure blob storage

2 answers

Your answer