Share via

Parallel Processing with Azure AI Document Intelligence – Multiple Instances for Faster Throughput?

BEJJANKI SHREYAS 0 Reputation points
2026-01-05T14:57:28.99+00:00

I’m currently using Azure AI Document Intelligence (DI) with a single endpoint to process PDF documents. My approach involves looping through the pages of a PDF and sending them one by one to the DI endpoint. The reason for this is —if I find the required value on the first page, I can skip processing the remaining pages, reducing token usage.

However, I’ve noticed that when I try to access the same DI endpoint with multiple instances or parallel requests, the processing seems to be queued and handled sequentially rather than concurrently.

Questions:

  1. Is this expected behavior for a single DI endpoint?
  2. If yes, what is the recommended way to achieve true parallel processing?
    • Can we create multiple DI endpoints or instances?
      • Are there any best practices for scaling DI for high-throughput scenarios?

Any guidance or suggestions would be greatly appreciated!

Azure Document Intelligence in Foundry Tools
0 comments No comments

2 answers

Sort by: Most helpful
  1. SRILAKSHMI C 18,390 Reputation points Microsoft External Staff Moderator
    2026-01-13T10:23:01.3+00:00

    Hello BEJJANKI SHREYAS,

    Welcome to Microsoft Q&A and Thank you for reaching out.

    Yes, what you are observing is expected behavior when using a single Azure AI Document Intelligence (DI) resource. This is related to how concurrency, throttling, and scaling are handled by the service.

    1. Is this expected behavior for a single DI endpoint?

    Yes. This is expected.

    Azure AI Document Intelligence is a multi-tenant managed service and enforces service-side concurrency limits per resource. Even if your application sends multiple parallel requests:

    Requests to the same DI resource may be queued

    Processing can appear sequential once concurrency limits are reached

    This behavior is by design to ensure service stability and fair usage

    This applies regardless of whether you send:

    Page-by-page requests

    Multiple PDFs

    Parallel threads or application instances

    Client-side parallelism does not override service-side throttling.

    2. Why parallel requests don’t run concurrently on one endpoint

    Each Document Intelligence resource has:

    A maximum concurrent request limit

    A throughput (TPS) cap per region

    Internal queueing once those limits are exceeded

    When limits are reached:

    Requests are accepted

    But processed sequentially or in small batches

    This is why scaling your application alone does not increase throughput on a single DI endpoint.

    3. Recommended ways to achieve true parallel processing

    1: Use multiple DI resources

    You can create multiple Azure AI Document Intelligence resources, for example:

    DI-Resource-01

    DI-Resource-02

    DI-Resource-03

    Then:

    Distribute documents across these resources

    Each resource has its own endpoint, quota, and concurrency limits

    Requests are processed independently

    Fully supported, Common in high-throughput architectures, Predictable and scalable performance

    2: Use asynchronous Analyze APIs correctly

    Make sure your application:

    Uses async analyze operations

    Polls for results

    Is fully non-blocking

    This improves efficiency and resource utilization, but does not bypass concurrency limits on its own.

    3: Page-level optimization

    Your strategy of:

    Processing pages incrementally

    Stopping once the required value is found

    is valid and recommended for:

    Cost optimization

    Reducing unnecessary processing

    However:

    • It does not improve parallelism at the service level
    • It only reduces total work per document

    4. Can you create multiple DI endpoints or instances?

    You cannot create multiple endpoints within a single DI resource.

    However, you can:

    Create multiple DI resources

    Each resource provides its own endpoint and limits

    Load-balance requests across them in your application

    This is the only supported way today to achieve horizontal scaling with DI.

    5. Best practices for high-throughput DI workloads

    Recommended architecture:

    Use multiple DI resources (horizontal scaling)

    Use async analyze APIs

    Queue work using:

    • Azure Queue Storage or
    • Azure Service Bus

    Use a worker pool that:

    • Pulls from the queue
    • Routes requests to available DI resources

    Monitor:

    • Throttling responses (429)
    • Latency per resource
    • Error rates

    Avoid:

    • Sending excessive parallel requests to a single DI resource
    • Assuming client-side concurrency equals service-side concurrency

    Sequential processing on a single DI endpoint is expected

    DI enforces concurrency and throughput limits per resource

    True parallelism requires multiple DI resources

    Page-level optimization helps cost, not throughput

    Horizontal scaling is the supported and recommended approach

    This behavior is by design, not a bug or misconfiguration.

    Please refer this

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    Was this answer helpful?


  2. AI answer

    2026-01-05T14:57:41.0366667+00:00

    Yes, the behavior you are experiencing is expected for a single Azure AI Document Intelligence (DI) endpoint. Each endpoint can handle only one request at a time, which means that if multiple requests are sent to the same endpoint, they will be queued and processed sequentially.

    To achieve true parallel processing, you can consider the following recommendations:

    1. Create Multiple DI Endpoints: You can set up multiple DI endpoints to handle requests in parallel. Each endpoint can process requests independently, allowing you to send multiple requests simultaneously.
    2. Scaling for High-Throughput Scenarios:
      • Ensure that you have sufficient capacity by checking the number of search units used by your service. The number of concurrent requests you can handle is approximately equal to the number of search units available.
      • You may also want to implement partitioning of your data and create multiple indexers if applicable, which can help in managing large datasets more efficiently.
      • Monitor your service limits and consider creating a support request to increase your service limits if your workload requires a higher degree of concurrent processing.

    By following these practices, you should be able to improve the throughput of your document processing tasks significantly.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.