Azure OpenAI Batch API – token_limit_exceeded error despite splitting batches

Question

Azure OpenAI Batch API – token_limit_exceeded error despite splitting batches

Bidur Nepali 20

I’m using Azure OpenAI Batch API with a GPT‑4.1 batch deployment. Each .jsonl file is under 30K tokens, but I still receive:

Also, is there any endpoint to know that if I can request for batch processing or not with respect to the enqueued tokens?

{ "data": [ { "code": "token_limit_exceeded", "message": "The number of enqueued tokens has surpassed the configured limit of 30K for the deployment 'gpt-4.1-batch'. Please either increase the limit or try again once the current jobs have finished.", "line": null, "param": null } ], "object": "list" } 
}

Bidur Nepali 20 Reputation points

2025-06-19T23:53:43.2066667+00:00

Does this mean, I can only process batch of 30K tokens at a time??

Accepted answer

0 additional answers

Your answer

Bidur Nepali 20 Reputation points

2025-06-19T23:53:43.2066667+00:00

Does this mean, I can only process batch of 30K tokens at a time??

Answer 1

Pavankumar Purilla 8,335 Microsoft External Staff Moderator

Hi Bidur Nepali,

The token_limit_exceeded error in Azure OpenAI Batch API doesn’t mean that an individual .jsonl file is too large, but rather that the total number of tokens across all enqueued and in‑progress batch jobs has exceeded the configured limit, which is 30,000 tokens for the gpt‑4.1‑batch deployment. Azure applies this limit at the queue level, so even if your files are well under 30K tokens, you’ll still get this error if the combined tokens across jobs exceed that threshold. Currently, there is no dedicated endpoint to check the available batch token quota before submitting jobs. The best practice is to track your batch jobs by listing their status, summing their estimated token usage, and submitting new jobs only when enough capacity is available. Alternatively, you can request a quota increase if your workload needs higher throughput.
For more information: Azure OpenAI in Azure AI Foundry Models quotas and limits
Getting started with Azure OpenAI batch deployments

I hope this information helps. Thank you!

Bidur Nepali 20 Reputation points

2025-06-23T00:05:33.4833333+00:00

@Pavankumar Purilla Thank you for the comment. I see. So, Basically we have to increase the enqueued token limit and always check if the current processing batches tokens and then only process other batches right.

Suppose an application have more than 20K users. And the application is using batch process. So the probability of the users to use batch process is also high. In this type of case how can I handle, although if I increased my token, it will soon exceed token limit because of the large number of users. So, can you also provide me options that I can follow. It would be really helpful.
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-23T00:26:23.14+00:00

Hi Bidur Nepali,
You’re right — with a large number of users using batch processing, simply increasing the token limit won’t be enough, as it can still be exceeded quickly. In such cases, it’s important to control how and when batches are submitted. A good approach is to add queuing or throttling in your application so that new batches are only sent when existing batches have finished or when token usage is safely below the limit. You can also split large workloads into smaller, more manageable batches, or schedule batch submissions during off-peak times to reduce load. If your usage requires more capacity, you can raise a support request to Azure to increase your token and batch limits.
Bidur Nepali 20 Reputation points

2025-06-23T02:25:02.3933333+00:00

@Pavankumar Purilla Thank you for you feedback. I will try to do something like you suggested.
Bidur Nepali 20 Reputation points

2025-06-23T05:41:51.0966667+00:00

Hi @Pavankumar Purilla ,

Thank you again for your valuable response earlier. I’d like to follow up with some clarification and additional questions about the Batch API. As you have mentioned like

A good approach is to add queuing or throttling in your application so that new batches are only sent when existing batches have finished or when token usage is safely below the limit. You can also split large workloads into smaller, more manageable batches, or schedule batch submissions during off-peak times to reduce load.

My Use Case

Previously, I was using the standard Chat/Completion API to handle requests directly. However, I began running into token limit issues.

To mitigate this, instead of implementing a traditional queuing or throttling system, I created a mechanism that sends around suppose 10 parallel requests at a time — effectively managing concurrency on my end to control throughput.

Now, I'm experimenting with the Azure OpenAI Batch API to see if it offers a better solution. However, I'm encountering new issues — especially around enqueued tokens , which are impacting performance. Also if I use queue system, it will take more time to complete the batch although multiple batches are submitted in the same time.

My scenario:

Suppose I’m translating a large subtitle file into 10 languages. To manage size, I split the subtitle into smaller chunks (each under 30k tokens) and send them in parallel for translation. I expected the Batch API to help with this, but I’m getting "enqueued tokens limit exceeded" errors — likely because all 10 batch jobs are submitted simultaneously.

My questions:

Is this kind of parallel multi-language translation not suitable for the Batch API?

What are the actual ideal use cases for the Batch API?

Is Batch designed more for long-running, asynchronous workloads rather than concurrent tasks?

Would I be better off using normal Completion API with throttling or queuing instead?

Sorry for the barrage of questions — just trying to find the best approach for a translation workflow like this. Thanks a lot for your guidance!
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-23T08:23:06.7266667+00:00

Hi Bidur Nepali,

Thank you for the detailed explanation of your scenario it really helps clarify the challenges. You’re right that the Azure OpenAI Batch API is not designed for high-concurrency use cases like parallel, real-time translation of multiple chunks into multiple languages at the same time. The Batch API is better suited for large, long-running asynchronous workloads where you can submit a large job and let it process in the background for example, processing large document collections or running offline analyses. For your use case, where you need to manage many concurrent smaller tasks (like splitting subtitles into parts and translating into multiple languages simultaneously), using the standard Chat/Completion API with a good throttling or queuing mechanism at your application level is likely a better fit. This way, you can control throughput without running into enqueued token limits, and you’ll have more predictable performance.
Bidur Nepali 20 Reputation points

2025-06-24T00:28:34.35+00:00

Hi @Pavankumar Purilla

Thank you very much for your valuable feedback.

To summarize my understanding based on your explanation: in my case, if I want to use the batch method, I need to prepare a single batch containing all the files that I want to translate. This batch should represent a large job, while staying within the token and file size limits. Only then can I effectively utilize the batch method. Is that correct?

To confirm my understanding further:

The batch method is not ideal for concurrent or frequent processing. If I need to run frequent batch requests, then using the batch approach may not be suitable.

The batch method is best suited for large, infrequent jobs, where we can gather everything into a single submission while adhering to the limitations.

Thank you again for your continued support. The batch feature seems a bit different than I initially expected, and I would really appreciate learning more about its practical applications.

If possible, could you kindly share a short, real-world use case where the batch method is effectively used? That would greatly help in clarifying my understanding.

Once again, thank you for your assistance.
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-24T06:00:34.2166667+00:00

Hi Bidur Nepali,
You’ve understood it exactly right. With the Azure OpenAI Batch API, the best way to use it is to prepare a single batch job that contains all of the data you want to process at once (such as your combined subtitle chunks for multiple languages), while staying within the token and file size limits. The Batch API is indeed designed for large, infrequent, long-running jobs rather than frequent, high-concurrency tasks. It works well when you can gather all the data for processing into one or a few well-structured batch submissions, and let it process asynchronously in the background.

As for a real-world use case: imagine a company that needs to summarize or classify thousands of archived customer emails, support tickets, or survey responses. They can prepare the dataset, submit it as a batch job, and let the API process everything in one go without the need for immediate results. Similarly, some teams use the Batch API to translate or analyze large document collections or knowledge bases as part of periodic updates.
Bidur Nepali 20 Reputation points

2025-06-26T00:39:28.9033333+00:00

Hi Pavankumar Purilla,

Thank you for the answer. I got some understanding about the batch method. I will try to fix my application according to the discussion we have done.

Once again, thank you very much for giving your time.

Share via

Azure OpenAI Batch API – token_limit_exceeded error despite splitting batches

0 additional answers

Your answer