Azure OpenAI Batch Job (GPT-4o-mini) completes early by only finishing a few rows without any error!!!

Question

Azure OpenAI Batch Job (GPT-4o-mini) completes early by only finishing a few rows without any error!!!

Nandan Thakur 30

I tried uploading a batch job on March 14th, 2025, with unique custom_id for each row in my input file. The job gets validated but completes very quickly within 10 minutes, and once I check the job only 276/4096 (as shown in the example below) is completed.

I'm unsure what is going wrong here. There is no error. thought it might be a duplicate custom ID issue, but still, after resolving that, I face the same issue.

This is an example of the batch data below, which shows status as completed however,

request_counts=BatchRequestCounts(completed=276, failed=0, total=4096) shows only 276 are completed.

Batch(id='batch_02d3c78a-ba97-40e3-8646-e83099ba5dbb', completion_window='24h', created_at=1742003026, endpoint='/chat/completions', input_file_id='file-083b8e3f2f024dc9a34a06d6014679c9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003648, error_file_id='file-c93301a5-0199-479e-8660-71426f22ce2d', errors=None, expired_at=None, expires_at=1742089426, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003279, metadata=None, output_file_id='file-34f7b43e-956c-4bc7-9d0a-6699db9333c6', request_counts=BatchRequestCounts(completed=276, failed=0, total=4096))

I was expecting that the batch output would provide me all responses in a single file with 4096 rows, but only 276 came back in the output file.

My rate limits are also high, so I don't think that's an issue.

I tried running GPT-4o-mini (Global Batch) on a batch of 5K training samples. This is the example of a single instance taken from the file containing 4096 samples ~ 139MB in size:

{
   "custom_id": "0_18_v2_fever_958611a10d8432c7ca51a59fca384dbc", 
   "method": "POST", 
   "url": "/chat/completions", 
   "body": {
     "model": "gpt-4o-mini-batch", # my deployment name
     "messages": [
                    {"role": "user", "content": "<my prompt here>"}
                 ], 
     "max_completion_tokens": 4096, 
     "temperature": 0.1
    }
}

I also tried deploying a new model and uploading the files, however, I got back the same response -> almost similar rows were completed only ~200. I am also showing the jobs run till now, only the first run got completed, returning all the rows required. Here are all the jobs below:

Batch(id='batch_02d3c78a-ba97-40e3-8646-e83099ba5dbb', completion_window='24h', created_at=1742003026, endpoint='/chat/completions', input_file_id='file-083b8e3f2f024dc9a34a06d6014679c9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003648, error_file_id='file-c93301a5-0199-479e-8660-71426f22ce2d', errors=None, expired_at=None, expires_at=1742089426, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003279, metadata=None, output_file_id='file-34f7b43e-956c-4bc7-9d0a-6699db9333c6', request_counts=BatchRequestCounts(completed=276, failed=0, total=4096))
Batch(id='batch_15b67c2a-d941-43fa-94dd-d433ebdc94c4', completion_window='24h', created_at=1742003010, endpoint='/chat/completions', input_file_id='file-9dc4bf26713147aa98e19621fa0f907e', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003745, error_file_id='file-2be4eb9a-8abc-4312-801c-8e258fac607d', errors=None, expired_at=None, expires_at=1742089410, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003279, metadata=None, output_file_id='file-417f3e4f-f299-4b3a-a195-827c8f4db1ca', request_counts=BatchRequestCounts(completed=265, failed=0, total=5000))
Batch(id='batch_9dccaf7c-1394-4778-aa07-aa02b280c772', completion_window='24h', created_at=1742002991, endpoint='/chat/completions', input_file_id='file-d7aaa72ebff440b19722cb9c9b8d205f', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003646, error_file_id='file-c2634ad4-99fc-4575-a6b3-d7b68492b97a', errors=None, expired_at=None, expires_at=1742089391, failed_at=None, finalizing_at=1742003527, in_progress_at=1742003293, metadata=None, output_file_id='file-6b9c6de8-4514-4b94-a42c-b1a1cbb1c635', request_counts=BatchRequestCounts(completed=275, failed=0, total=5000))
Batch(id='batch_e945eb8c-31a9-4a07-a415-356d1a064fe2', completion_window='24h', created_at=1742002970, endpoint='/chat/completions', input_file_id='file-e337e5ac8c06408ba67cae7b624f0c28', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003757, error_file_id='file-414ee824-b070-4bb1-9d80-14dd65a4aea2', errors=None, expired_at=None, expires_at=1742089370, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003281, metadata=None, output_file_id='file-07ff6f4e-ab4b-4d15-9932-b84058f70736', request_counts=BatchRequestCounts(completed=272, failed=0, total=5000))
Batch(id='batch_cf622edb-2670-4ff2-9ee9-bd25bb35a3a0', completion_window='24h', created_at=1742002952, endpoint='/chat/completions', input_file_id='file-c88451a73f434a3b85bc0f95c21be385', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003753, error_file_id='file-f6d2900c-fb52-4c13-9bba-4cce1ffaec14', errors=None, expired_at=None, expires_at=1742089352, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003289, metadata=None, output_file_id='file-9af3869e-fff3-45a4-b7a4-c38c9f2f9bac', request_counts=BatchRequestCounts(completed=270, failed=0, total=5000))
Batch(id='batch_40d32e10-a5a9-4c8c-93fc-03179afcfe78', completion_window='24h', created_at=1742002931, endpoint='/chat/completions', input_file_id='file-f1f64a0a175044e180afc9e9396d67d9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003640, error_file_id='file-6d878431-9fca-44f8-a849-6972505d4d4a', errors=None, expired_at=None, expires_at=1742089331, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003278, metadata=None, output_file_id='file-6f8cd787-c3b2-41bf-b7ef-0eba738899e8', request_counts=BatchRequestCounts(completed=277, failed=0, total=5000))
...
Batch(id='batch_23204b19-99e7-4ac3-8dd1-a70929280323', completion_window='24h', created_at=1741826257, endpoint='/chat/completions', input_file_id='file-3e06cf290fb24b0592154510e54b8809', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1741828981, error_file_id='file-da5bbdd7-6872-4379-b0fb-4a0376b6e6cc', errors=None, expired_at=None, expires_at=1741912657, failed_at=None, finalizing_at=1741828830, in_progress_at=1741828482, metadata=None, output_file_id='file-95378806-a0bc-4d80-a27a-eb3171ad9f70', request_counts=BatchRequestCounts(completed=5243, failed=0, total=5243))

I uploaded the batch script with Python, here is the pseudocode:

import os
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") if endpoint is None else endpoint,
    api_key=os.getenv("AZURE_OPENAI_API_KEY") if api_key is None else api_key,
    api_version=os.getenv("AZURE_OPENAI_API_VERSION") if api_version is None else api_version,
)

file = client.files.create(
    file=open(final_filepath, "rb"), 
    purpose="batch"
)
file_id = file.id

batch_response = client.batches.create(
     input_file_id=file_id,
     endpoint="/chat/completions",
     completion_window="24h"
)

Elvis Wianda 5 Reputation points

2025-03-16T21:39:29.5433333+00:00

I am facing this same issue, an I will say this seems to be new since I have been running multiple batch jobs in the past months with no problems. My API version is 2024-10-21
Schloss, Shy 10 Reputation points

2025-03-17T08:33:44.1566667+00:00

I'm experiencing similar issues with 4o mini as a Global Batch with API version 2024-10-21.
Furthermore, it worked fine with 40 till 3-4 days ago.

My batches are under the limits and ran before.
Sanskar Shah 10 Reputation points

2025-03-17T13:44:24.1933333+00:00

I am facing the same issue. It was working well before 3-4 days and same error I am facing. Even the error output file is 0 MB still it doesn't give complete output
Leo Leung 0 Reputation points

2025-03-17T20:39:29.67+00:00

I am facing the same issue. Discovered this issue on March 13th. It is still the case this morning
Saideep Anchuri 9,500 Reputation points Moderator

2025-03-20T07:21:47.8066667+00:00

Hi Nandan Thakur,

Just checking in to see if the below answer provided by @Chakaravarthi Rangarajan Bhargavi helped.

Thank you.
Nandan Thakur 30 Reputation points

2025-04-03T21:32:25.1933333+00:00

Hi @Saideep Anchuri : no, the answer did not work. Azure API should figure out this bug and take it seriously. It's limiting to even use the batch functionality otherwise.
Saideep Anchuri 9,500 Reputation points Moderator

2025-04-04T11:29:00.16+00:00
Hi Nandan Thakur,

It seems that your batch job is completing early with only a portion of the requests processed, despite having a status of "completed." This can happen for several reasons,

Here are some steps:

Ensure that your input file adheres to the JSON format correctly. If there are any invalid JSON lines, they may not be processed, which could lead to fewer completed requests.

Although you mentioned that your rate limits are high, ensure that the total number of requests does not exceed the maximum allowed value of 100,000 for batch jobs.

If the batch size is too large or the requests are too complex, it might cause the processing to be throttled or incomplete. Consider breaking down the batch into smaller sizes.

Check the error file associated with your batch job (as indicated by error_file_id). It may contain additional information about any issues encountered during processing.

Kindly refer below link: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=global-batch%2Cstandard-input%2Cpython-secure&pivots=ai-foundry-portal#troubleshooting

Thank you.
Saideep Anchuri 9,500 Reputation points Moderator

2025-04-06T12:05:46+00:00

Hi Nandan Thakur,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.

1 answer

Your answer

Elvis Wianda 5 Reputation points

2025-03-16T21:39:29.5433333+00:00

I am facing this same issue, an I will say this seems to be new since I have been running multiple batch jobs in the past months with no problems. My API version is 2024-10-21
Schloss, Shy 10 Reputation points

2025-03-17T08:33:44.1566667+00:00

I'm experiencing similar issues with 4o mini as a Global Batch with API version 2024-10-21.
Furthermore, it worked fine with 40 till 3-4 days ago.

My batches are under the limits and ran before.
Sanskar Shah 10 Reputation points

2025-03-17T13:44:24.1933333+00:00

I am facing the same issue. It was working well before 3-4 days and same error I am facing. Even the error output file is 0 MB still it doesn't give complete output
Leo Leung 0 Reputation points

2025-03-17T20:39:29.67+00:00

I am facing the same issue. Discovered this issue on March 13th. It is still the case this morning
Saideep Anchuri 9,500 Reputation points Moderator

2025-03-20T07:21:47.8066667+00:00

Hi Nandan Thakur,

Just checking in to see if the below answer provided by @Chakaravarthi Rangarajan Bhargavi helped.

Thank you.
Nandan Thakur 30 Reputation points

2025-04-03T21:32:25.1933333+00:00

Hi @Saideep Anchuri : no, the answer did not work. Azure API should figure out this bug and take it seriously. It's limiting to even use the batch functionality otherwise.
Saideep Anchuri 9,500 Reputation points Moderator

2025-04-04T11:29:00.16+00:00

Hi Nandan Thakur,

It seems that your batch job is completing early with only a portion of the requests processed, despite having a status of "completed." This can happen for several reasons,

Here are some steps:

Ensure that your input file adheres to the JSON format correctly. If there are any invalid JSON lines, they may not be processed, which could lead to fewer completed requests.

Although you mentioned that your rate limits are high, ensure that the total number of requests does not exceed the maximum allowed value of 100,000 for batch jobs.

If the batch size is too large or the requests are too complex, it might cause the processing to be throttled or incomplete. Consider breaking down the batch into smaller sizes.

Check the error file associated with your batch job (as indicated by error_file_id). It may contain additional information about any issues encountered during processing.

Kindly refer below link: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch?tabs=global-batch%2Cstandard-input%2Cpython-secure&pivots=ai-foundry-portal#troubleshooting

Thank you.
Saideep Anchuri 9,500 Reputation points Moderator

2025-04-06T12:05:46+00:00

Hi Nandan Thakur,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.

Thank You.

Answer 1

Hi Nandan Thakur,

Welcome to the Microsoft Q&A Forum! Thanks for your question.

The issue you're facing, where your Azure OpenAI Batch Job completes prematurely without processing all rows and without any errors, can stem from multiple factors. Below are potential causes and solutions to resolve the issue effectively.

There are few possible Causes & Solutions which i had discussed below

Structured Outputs Compatibility

Ensure that structured outputs are supported by your deployment model.
If you are using Global Batch, structured outputs are only supported with API version 2024-08-01-preview or later.
Using an unsupported API version may cause incomplete processing without explicit errors.

Solution: Update your API version to 2024-08-01-preview or later.

Reference: How to use Global Batch Processing with Azure OpenAI

Batch Size and Rate Limits

Even though your rate limits are high, Azure OpenAI may still enforce hidden limits on batch processing.
If a batch job exceeds an optimal processing threshold, Azure OpenAI may truncate processing without failure messages.
Large batch jobs can also trigger internal timeouts before all rows are processed.

Solution: Reduce the batch size and split large datasets into smaller batches. Ensure no individual batch exceeds system limitations.

Reference: Azure Batch - Getting Started with JavaScript

API Version Compatibility

Using an outdated or unsupported API version can cause unexpected behavior.
Certain API versions might lack full support for batch processing, causing the issue you're facing.

Solution: Ensure you’re using the latest API version supported by your Azure OpenAI deployment. If using Azure Machine Learning, verify compatibility with AzureML API v2.

Reference: Azure Machine Learning - Batch Model for OpenAI Embeddings

Monitoring Batch Job Execution

Job monitoring is crucial to ensure every row is processed.
Azure does not always provide real-time error logs for batch jobs.
If rows are skipped or not processed, logging can help identify patterns.

Solution: Enable logging for batch jobs. Use Azure Monitor & Application Insights to track execution. Run smaller test batches before scaling up. Reference: Azure Batch Job Task Error Checking

Next Steps

Please review these recommendations and adjust your batch job accordingly. If the issue persists, provide additional details on your deployment setup, and I'd be happy to assist further!

Regards,

Chakravarthi Rangarajan Bhargavi

If this answer was helpful, kindly upvote and accept it to support the community. Thanks!

Sanskar Shah 10 Reputation points

2025-03-17T13:56:14.2433333+00:00

@Chakaravarthi Rangarajan Bhargavi Hey.
What could be the reason for the hidden limits?
Please help me out knowing the reason for having hidden limits.

Also I have been using Batch API for 4-5 months and it was working smoothly.
Our API versions which we are using are later than **2024-08-01-preview.
**
We can send and have a limit of 100000 per batch. But even in 5k, it is ending prematurely without failure message.
Please we need a solution since it is very small amount which it is unable process which it was able to process 3-4 days back.

Share via

Azure OpenAI Batch Job (GPT-4o-mini) completes early by only finishing a few rows without any error!!!

1 answer

Your answer