An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
Hello BEJJANKI SHREYAS,
Welcome to Microsoft Q&A and Thank you for reaching out.
Yes, what you are observing is expected behavior when using a single Azure AI Document Intelligence (DI) resource. This is related to how concurrency, throttling, and scaling are handled by the service.
1. Is this expected behavior for a single DI endpoint?
Yes. This is expected.
Azure AI Document Intelligence is a multi-tenant managed service and enforces service-side concurrency limits per resource. Even if your application sends multiple parallel requests:
Requests to the same DI resource may be queued
Processing can appear sequential once concurrency limits are reached
This behavior is by design to ensure service stability and fair usage
This applies regardless of whether you send:
Page-by-page requests
Multiple PDFs
Parallel threads or application instances
Client-side parallelism does not override service-side throttling.
2. Why parallel requests don’t run concurrently on one endpoint
Each Document Intelligence resource has:
A maximum concurrent request limit
A throughput (TPS) cap per region
Internal queueing once those limits are exceeded
When limits are reached:
Requests are accepted
But processed sequentially or in small batches
This is why scaling your application alone does not increase throughput on a single DI endpoint.
3. Recommended ways to achieve true parallel processing
1: Use multiple DI resources
You can create multiple Azure AI Document Intelligence resources, for example:
DI-Resource-01
DI-Resource-02
DI-Resource-03
Then:
Distribute documents across these resources
Each resource has its own endpoint, quota, and concurrency limits
Requests are processed independently
Fully supported, Common in high-throughput architectures, Predictable and scalable performance
2: Use asynchronous Analyze APIs correctly
Make sure your application:
Uses async analyze operations
Polls for results
Is fully non-blocking
This improves efficiency and resource utilization, but does not bypass concurrency limits on its own.
3: Page-level optimization
Your strategy of:
Processing pages incrementally
Stopping once the required value is found
is valid and recommended for:
Cost optimization
Reducing unnecessary processing
However:
- It does not improve parallelism at the service level
- It only reduces total work per document
4. Can you create multiple DI endpoints or instances?
You cannot create multiple endpoints within a single DI resource.
However, you can:
Create multiple DI resources
Each resource provides its own endpoint and limits
Load-balance requests across them in your application
This is the only supported way today to achieve horizontal scaling with DI.
5. Best practices for high-throughput DI workloads
Recommended architecture:
Use multiple DI resources (horizontal scaling)
Use async analyze APIs
Queue work using:
- Azure Queue Storage or
- Azure Service Bus
Use a worker pool that:
- Pulls from the queue
- Routes requests to available DI resources
Monitor:
- Throttling responses (429)
- Latency per resource
- Error rates
Avoid:
- Sending excessive parallel requests to a single DI resource
- Assuming client-side concurrency equals service-side concurrency
Sequential processing on a single DI endpoint is expected
DI enforces concurrency and throughput limits per resource
True parallelism requires multiple DI resources
Page-level optimization helps cost, not throughput
Horizontal scaling is the supported and recommended approach
This behavior is by design, not a bug or misconfiguration.
Please refer this
- Azure Document Intelligence Pricing
- Best Practices for Using Azure Document Intelligence
- Upgrade Guide for Azure Services
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thank you!