Parallel Processing with Azure AI Document Intelligence – Multiple Instances for Faster Throughput?

Question

Parallel Processing with Azure AI Document Intelligence – Multiple Instances for Faster Throughput?

BEJJANKI SHREYAS 0

I’m currently using Azure AI Document Intelligence (DI) with a single endpoint to process PDF documents. My approach involves looping through the pages of a PDF and sending them one by one to the DI endpoint. The reason for this is —if I find the required value on the first page, I can skip processing the remaining pages, reducing token usage.

However, I’ve noticed that when I try to access the same DI endpoint with multiple instances or parallel requests, the processing seems to be queued and handled sequentially rather than concurrently.

Questions:

Is this expected behavior for a single DI endpoint?
If yes, what is the recommended way to achieve true parallel processing?
- Can we create multiple DI endpoints or instances?
  - Are there any best practices for scaling DI for high-throughput scenarios?

Any guidance or suggestions would be greatly appreciated!

0 comments

2 answers

Your answer

Answer 1

Hello BEJJANKI SHREYAS,

Welcome to Microsoft Q&A and Thank you for reaching out.

Yes, what you are observing is expected behavior when using a single Azure AI Document Intelligence (DI) resource. This is related to how concurrency, throttling, and scaling are handled by the service.

1. Is this expected behavior for a single DI endpoint?

Yes. This is expected.

Azure AI Document Intelligence is a multi-tenant managed service and enforces service-side concurrency limits per resource. Even if your application sends multiple parallel requests:

Requests to the same DI resource may be queued

Processing can appear sequential once concurrency limits are reached

This behavior is by design to ensure service stability and fair usage

This applies regardless of whether you send:

Page-by-page requests

Multiple PDFs

Parallel threads or application instances

Client-side parallelism does not override service-side throttling.

2. Why parallel requests don’t run concurrently on one endpoint

Each Document Intelligence resource has:

A maximum concurrent request limit

A throughput (TPS) cap per region

Internal queueing once those limits are exceeded

When limits are reached:

Requests are accepted

But processed sequentially or in small batches

This is why scaling your application alone does not increase throughput on a single DI endpoint.

3. Recommended ways to achieve true parallel processing

1: Use multiple DI resources

You can create multiple Azure AI Document Intelligence resources, for example:

DI-Resource-01

DI-Resource-02

DI-Resource-03

Then:

Distribute documents across these resources

Each resource has its own endpoint, quota, and concurrency limits

Requests are processed independently

Fully supported, Common in high-throughput architectures, Predictable and scalable performance

2: Use asynchronous Analyze APIs correctly

Make sure your application:

Uses async analyze operations

Polls for results

Is fully non-blocking

This improves efficiency and resource utilization, but does not bypass concurrency limits on its own.

3: Page-level optimization

Your strategy of:

Processing pages incrementally

Stopping once the required value is found

is valid and recommended for:

Cost optimization

Reducing unnecessary processing

However:

It does not improve parallelism at the service level
It only reduces total work per document

4. Can you create multiple DI endpoints or instances?

You cannot create multiple endpoints within a single DI resource.

However, you can:

Create multiple DI resources

Each resource provides its own endpoint and limits

Load-balance requests across them in your application

This is the only supported way today to achieve horizontal scaling with DI.

5. Best practices for high-throughput DI workloads

Recommended architecture:

Use multiple DI resources (horizontal scaling)

Use async analyze APIs

Queue work using:

Azure Queue Storage or
Azure Service Bus

Use a worker pool that:

Pulls from the queue
Routes requests to available DI resources

Monitor:

Throttling responses (429)
Latency per resource
Error rates

Avoid:

Sending excessive parallel requests to a single DI resource
Assuming client-side concurrency equals service-side concurrency

Sequential processing on a single DI endpoint is expected

DI enforces concurrency and throughput limits per resource

True parallelism requires multiple DI resources

Page-level optimization helps cost, not throughput

Horizontal scaling is the supported and recommended approach

This behavior is by design, not a bug or misconfiguration.

Please refer this

I Hope this helps. Do let me know if you have any further queries.

If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

Thank you!

SRILAKSHMI C 18,390 Reputation points Microsoft External Staff Moderator

2026-01-14T07:44:44.68+00:00

Hi BEJJANKI SHREYAS,

Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Thank you!
SRILAKSHMI C 18,390 Reputation points Microsoft External Staff Moderator

2026-01-15T17:25:31+00:00

Hi BEJJANKI SHREYAS,

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.

Looking forward to your response and appreciate your time on this.

If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

Thank you!

Answer 2

AI answer

Yes, the behavior you are experiencing is expected for a single Azure AI Document Intelligence (DI) endpoint. Each endpoint can handle only one request at a time, which means that if multiple requests are sent to the same endpoint, they will be queued and processed sequentially.

To achieve true parallel processing, you can consider the following recommendations:

Create Multiple DI Endpoints: You can set up multiple DI endpoints to handle requests in parallel. Each endpoint can process requests independently, allowing you to send multiple requests simultaneously.
Scaling for High-Throughput Scenarios:
- Ensure that you have sufficient capacity by checking the number of search units used by your service. The number of concurrent requests you can handle is approximately equal to the number of search units available.
- You may also want to implement partitioning of your data and create multiple indexers if applicable, which can help in managing large datasets more efficiently.
- Monitor your service limits and consider creating a support request to increase your service limits if your workload requires a higher degree of concurrent processing.

By following these practices, you should be able to improve the throughput of your document processing tasks significantly.

References:

AI-generated content may be incorrect. Read our transparency notes for more information.

Share via

Parallel Processing with Azure AI Document Intelligence – Multiple Instances for Faster Throughput?

2 answers

Your answer