Submit document blob path to Azure Document Intelligence service?

Question

Submit document blob path to Azure Document Intelligence service?

Dilip Jain 20

I'm using the Azure AI Document Intelligence Python SDK (azure-ai-documentintelligence) to analyze documents stored in Azure Blob Storage. My current workflow involves:

Downloading the document from Blob Storage to my application
For large documents (100+ pages), splitting the PDF into chunks in memory
Sending each chunk's bytes to the Document Intelligence service

Question:

Can I pass an Azure Blob URL directly to begin_analyze_document for single document analysis instead of downloading and uploading the file bytes? I'd like to provide a blob URL (with SAS token or Managed Identity) and have Document Intelligence fetch the document directly.
Does the service support page range parameters (e.g., pages="1-100") so I can analyze specific pages without splitting the PDF myself? This would allow me to process a long document in ranges without downloading/chunking.
Is there a single-document equivalent to AzureBlobContentSource used in batch processing? I know begin_analyze_batch_documents supports blob sources, but it seems like overkill for single document analysis.Is it possible to directly pass the Azure Blob Storage URL/path to the Document Intelligence service instead of downloading and uploading the file content? I want to avoid the intermediate step of fetching the blob content to my application before sending it to Document Intelligence.

0 comments

2 answers

Your answer

Answer 1

While the answer provided by the previous responder is technically correct regarding the Python SDK, I've discovered that the underlying REST API does support passing Azure Blob URLs directly for single-document analysis. Here's how you can achieve this:

Key Findings

The SDK limitation is real, but the REST API supports it: The begin_analyze_document method in the Python SDK (azure-ai-documentintelligence) only accepts file bytes/streams or base64-encoded content. However, the REST API accepts a urlSource parameter that allows you to pass a Blob URL (with SAS token) directly.

Page ranges are fully supported: You can use the pages query parameter (e.g., pages=1-100) to analyze specific page ranges, and the service will only process those pages.

Working Solution

Here's a complete example demonstrating how to pass an Azure Blob Storage URL directly to the Document Intelligence REST API:

import requests
import time
from retry import retry


class AzureFormRecognizer:
    def __init__(self, endpoint: str, key: str):
        self.endpoint = endpoint.rstrip('/')
        self.key = key

    @retry(tries=5, delay=5, jitter=2, backoff=2)
    def analyze_document_from_url(self, document_url: str, model_id: str = "prebuilt-document", 
                                   pages: str = None, output_format: str = None):
        """
        Analyze a document directly from an Azure Blob URL using the REST API.
        
        Args:
            document_url: Full URL to the document (including SAS token if required)
            model_id: The model to use (e.g., "prebuilt-document", "prebuilt-layout", "prebuilt-read")
            pages: Optional page range (e.g., "1-100", "1,3,5-10")
            output_format: Optional output format (e.g., "markdown")
        
        Returns:
            Operation location URL for polling results
        """
        headers = {
            "Content-Type": "application/json",
            "Ocp-Apim-Subscription-Key": self.key
        }
        
        # Build the API URL with optional query parameters
        url = f"{self.endpoint}/formrecognizer/documentModels/{model_id}:analyze?api-version=2023-07-31"
        
        if pages:
            url += f"&pages={pages}"
        if output_format:
            url += f"&outputContentFormat={output_format}"
        
        # Pass the blob URL directly using urlSource
        data = {"urlSource": document_url}
        
        response = requests.post(url=url, headers=headers, json=data, timeout=600)
        
        if response.status_code == 202:
            return response.headers["Operation-Location"]
        else:
            raise ConnectionError(f"Document processing initiation failed: {response.status_code} - {response.text}")

    @retry(exceptions=TimeoutError, tries=3, delay=5)
    def get_analyze_result(self, operation_location: str, timeout: int = 300):
        """
        Poll for and retrieve the analysis result.
        
        Args:
            operation_location: The Operation-Location URL returned from analyze_document_from_url
            timeout: Maximum time to wait for results (in seconds)
        
        Returns:
            The analyzeResult dictionary containing the document analysis
        """
        headers = {"Ocp-Apim-Subscription-Key": self.key}
        elapsed = 0
        poll_interval = 5
        
        while elapsed < timeout:
            response = requests.get(operation_location, headers=headers, timeout=60)
            
            if response.status_code != 200:
                raise RuntimeError(f"Failed to get result: {response.status_code} - {response.text}")
            
            data = response.json()
            status = data.get("status")
            
            if status == "succeeded":
                return data.get("analyzeResult")
            elif status == "failed":
                error = data.get("error", {})
                raise RuntimeError(f"Document analysis failed: {error.get('message', 'Unknown error')}")
            
            # Status is "running" or "notStarted" - continue polling
            time.sleep(poll_interval)
            elapsed += poll_interval
        
        raise TimeoutError(f"Document analysis timed out after {timeout} seconds")


# Usage Example
if __name__ == "__main__":
    # Your Azure Document Intelligence endpoint and key
    ENDPOINT = "https://your-resource.cognitiveservices.azure.com/"
    KEY = "your-api-key"
    
    # Your blob URL with SAS token
    BLOB_URL = "https://yourstorageaccount.blob.core.windows.net/container/document.pdf"
    SAS_TOKEN = "?sv=2025-07-05&spr=https&..."  # Your SAS token
    
    document_url = BLOB_URL + SAS_TOKEN
    
    fr = AzureFormRecognizer(endpoint=ENDPOINT, key=KEY)
    
    # Analyze pages 1-50 directly from blob storage
    operation_location = fr.analyze_document_from_url(
        document_url=document_url,
        model_id="prebuilt-document",
        pages="1-50"  # Optional: specify page range
    )
    
    print(f"Processing started: {operation_location}")
    
    result = fr.get_analyze_result(operation_location)
    
    print(f"API Version: {result['apiVersion']}")
    print(f"Model ID: {result['modelId']}")
    print(f"Pages analyzed: {len(result['pages'])}")

API Request

endpoint:
POST {endpoint}/formrecognizer/documentModels/{modelId}:analyze?api-version=2023-07-31&pages={pageRange}
request body: {"urlSource":"https://yourstorageaccount.blob.core.windows.net/container/document.pdf?{SAS_TOKEN}"}

Benefits of This Approach

Feature	SDK (begin_analyze_document)	REST API (urlSource)
Pass Blob URL directly	❌ Not supported	✅ Supported
Page range support	✅ Supported	✅ Supported
Avoids download/upload	❌ Must download first	✅ Service fetches directly
Network efficiency	Lower (double transfer)	Higher (single transfer)

Note:

SAS Token or Public Access or Managed Identity can be used.
Page Ranges: The pages parameter accepts various formats:
- Single page: pages=1
- Range: pages=1-100
- Multiple ranges: pages=1-10,15,20-30
Service Limits Still Apply: While this avoids the download/upload step, the document size limits of the service still apply. I was unable to get a 2000-page long pdf work even with page range of a single page.

References:

SRILAKSHMI C 19,005 Reputation points Microsoft External Staff Moderator

2025-12-31T12:20:02.42+00:00

Hi Dilip Kumar Jain,

Thank you for sharing this clarification.

You’re absolutely right: the limitation exists in the current Python SDK, not in the Azure Document Intelligence service itself. Calling the REST API directly with the urlSource parameter is a valid and supported approach, and your explanation clearly highlights that distinction.

The working example you’ve provided is especially helpful, as it demonstrates:

Direct analysis from Azure Blob Storage using a SAS URL

Use of page ranges via the pages query parameter

Elimination of the download → re-upload overhead, which significantly improves efficiency

This is a great workaround for scenarios where documents already reside in Blob Storage and aligns well with best practices for network efficiency. Your comparison between SDK and REST capabilities is also very clear and will definitely help others facing the same limitation.

Thanks again for taking the time to document and share this solution with full code and references.

Answer 2

Hello Dilip Jain,

Welcome to Microsoft Q&A and Thank you for reaching out.

I understand that you're working on optimizing your workflow for analyzing documents with Azure AI Document Intelligence, and you have a few great questions. Let’s break down your queries:

1. Passing an Azure Blob URL directly for single-document analysis

For single document analysis, Azure Document Intelligence does not currently support passing an Azure Blob URL (with SAS or Managed Identity) directly to begin_analyze_document.

That API only accepts:

File bytes / file streams, or
Base64-encoded document content

So today, for single-document calls, the service cannot fetch the file directly from Blob Storage. The download-then-upload step is required.

Direct blob access is supported only for batch APIs, not for single-document analysi

2. Analyzing specific page ranges (e.g., pages="1-100")

Page range parameters are supported in Azure Document Intelligence, but with an important limitation.

You can specify page ranges using the pages parameter (for example: "1-10", "11-20"), which controls which pages are analyzed by the service.

However:

The entire document must still be uploaded to Azure Document Intelligence
Page ranges do not allow partial uploads or bypass downloading/uploading the full file
If the document exceeds the service limits (for example, size limits), specifying a page range will not bypass those limits

Please refer this Azure AI Document Intelligence.

As a result, while using pages can help reduce processing cost and output size, it does not avoid the need to upload the full document. For very large files that exceed size limits, the document must still be split into smaller files before analysis.

3.Single-document equivalent of AzureBlobContentSource

Currently, Azure Document Intelligence does not provide a single-document equivalent of AzureBlobContentSource.

Blob-based inputs are supported only for batch or asynchronous workflows, such as:

begin_analyze_batch_documents
Other batch or blob-to-blob processing APIs

These APIs are designed for processing large document collections and support scenarios where the service reads documents directly from Azure Blob Storage.

For single-document analysis, the SDK does not expose a way to pass a Blob URL (with SAS or Managed Identity) directly. The supported options are:

Upload the document content (bytes/stream) directly to the service, or
Use a batch API even for a single document if you want a blob-to-blob workflow

As a result, for single-document scenarios, downloading and uploading the document content remains the required approach today.

While these limitations can be a bit challenging, they help ensure the service remains efficient and performs efficiently within its constraints. I would recommend staying updated with any changes in Azure Document Intelligence capabilities, as Microsoft frequently improves their services.

Also refer this

I Hope this helps. Do let me know if you have any further queries.

If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

Thank you!

SRILAKSHMI C 19,005 Reputation points Microsoft External Staff Moderator

2025-12-19T09:55:39.88+00:00

Hi Dilip Jain,

Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thank you!
SRILAKSHMI C 19,005 Reputation points Microsoft External Staff Moderator

2025-12-23T14:50:46.66+00:00

Hi Dilip Jain,

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.

Looking forward to your response and appreciate your time on this.

If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

Thank you!
Dilip Kumar Jain 5 Reputation points

2025-12-29T12:01:44.35+00:00

Hi @SRILAKSHMI C ,

Thank you for the detailed response. While your answer is technically correct according to the SDK documentation, I've discovered that the underlying REST API does support passing Azure Blob URLs directly for single-document analysis using the urlSource parameter.

The limitation is specific to the Python SDK's begin_analyze_document method, not the Azure Document Intelligence service itself. By calling the REST API directly, you can:

Pass a Blob URL directly (with SAS token) without downloading the file first

Use page ranges (pages=1-100) to analyze specific pages

I've posted a complete working solution as a separate answer with full code examples. This approach eliminates the download/upload step entirely and is much more efficient for documents stored in Azure Blob Storage.

Share via

Submit document blob path to Azure Document Intelligence service?

2 answers

Key Findings

Working Solution

API Request

Benefits of This Approach

Note:

References:

Your answer