Azure OpenAI Batch Job (GPT-4o-mini) completes early by only finishing a few rows without any error!!!

Nandan Thakur 30 Reputation points
2025-03-15T16:23:06.5633333+00:00

I tried uploading a batch job on March 14th, 2025, with unique custom_id for each row in my input file. The job gets validated but completes very quickly within 10 minutes, and once I check the job only 276/4096 (as shown in the example below) is completed.

I'm unsure what is going wrong here. There is no error. thought it might be a duplicate custom ID issue, but still, after resolving that, I face the same issue.

This is an example of the batch data below, which shows status as completed however,

request_counts=BatchRequestCounts(completed=276, failed=0, total=4096) shows only 276 are completed.

Batch(id='batch_02d3c78a-ba97-40e3-8646-e83099ba5dbb', completion_window='24h', created_at=1742003026, endpoint='/chat/completions', input_file_id='file-083b8e3f2f024dc9a34a06d6014679c9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003648, error_file_id='file-c93301a5-0199-479e-8660-71426f22ce2d', errors=None, expired_at=None, expires_at=1742089426, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003279, metadata=None, output_file_id='file-34f7b43e-956c-4bc7-9d0a-6699db9333c6', request_counts=BatchRequestCounts(completed=276, failed=0, total=4096))

I was expecting that the batch output would provide me all responses in a single file with 4096 rows, but only 276 came back in the output file.

My rate limits are also high, so I don't think that's an issue.

I tried running GPT-4o-mini (Global Batch) on a batch of 5K training samples. This is the example of a single instance taken from the file containing 4096 samples ~ 139MB in size:

{
   "custom_id": "0_18_v2_fever_958611a10d8432c7ca51a59fca384dbc", 
   "method": "POST", 
   "url": "/chat/completions", 
   "body": {
     "model": "gpt-4o-mini-batch", # my deployment name
     "messages": [
                    {"role": "user", "content": "<my prompt here>"}
                 ], 
     "max_completion_tokens": 4096, 
     "temperature": 0.1
    }
}

I also tried deploying a new model and uploading the files, however, I got back the same response -> almost similar rows were completed only ~200. I am also showing the jobs run till now, only the first run got completed, returning all the rows required. Here are all the jobs below:

Batch(id='batch_02d3c78a-ba97-40e3-8646-e83099ba5dbb', completion_window='24h', created_at=1742003026, endpoint='/chat/completions', input_file_id='file-083b8e3f2f024dc9a34a06d6014679c9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003648, error_file_id='file-c93301a5-0199-479e-8660-71426f22ce2d', errors=None, expired_at=None, expires_at=1742089426, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003279, metadata=None, output_file_id='file-34f7b43e-956c-4bc7-9d0a-6699db9333c6', request_counts=BatchRequestCounts(completed=276, failed=0, total=4096))
Batch(id='batch_15b67c2a-d941-43fa-94dd-d433ebdc94c4', completion_window='24h', created_at=1742003010, endpoint='/chat/completions', input_file_id='file-9dc4bf26713147aa98e19621fa0f907e', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003745, error_file_id='file-2be4eb9a-8abc-4312-801c-8e258fac607d', errors=None, expired_at=None, expires_at=1742089410, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003279, metadata=None, output_file_id='file-417f3e4f-f299-4b3a-a195-827c8f4db1ca', request_counts=BatchRequestCounts(completed=265, failed=0, total=5000))
Batch(id='batch_9dccaf7c-1394-4778-aa07-aa02b280c772', completion_window='24h', created_at=1742002991, endpoint='/chat/completions', input_file_id='file-d7aaa72ebff440b19722cb9c9b8d205f', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003646, error_file_id='file-c2634ad4-99fc-4575-a6b3-d7b68492b97a', errors=None, expired_at=None, expires_at=1742089391, failed_at=None, finalizing_at=1742003527, in_progress_at=1742003293, metadata=None, output_file_id='file-6b9c6de8-4514-4b94-a42c-b1a1cbb1c635', request_counts=BatchRequestCounts(completed=275, failed=0, total=5000))
Batch(id='batch_e945eb8c-31a9-4a07-a415-356d1a064fe2', completion_window='24h', created_at=1742002970, endpoint='/chat/completions', input_file_id='file-e337e5ac8c06408ba67cae7b624f0c28', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003757, error_file_id='file-414ee824-b070-4bb1-9d80-14dd65a4aea2', errors=None, expired_at=None, expires_at=1742089370, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003281, metadata=None, output_file_id='file-07ff6f4e-ab4b-4d15-9932-b84058f70736', request_counts=BatchRequestCounts(completed=272, failed=0, total=5000))
Batch(id='batch_cf622edb-2670-4ff2-9ee9-bd25bb35a3a0', completion_window='24h', created_at=1742002952, endpoint='/chat/completions', input_file_id='file-c88451a73f434a3b85bc0f95c21be385', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003753, error_file_id='file-f6d2900c-fb52-4c13-9bba-4cce1ffaec14', errors=None, expired_at=None, expires_at=1742089352, failed_at=None, finalizing_at=1742003631, in_progress_at=1742003289, metadata=None, output_file_id='file-9af3869e-fff3-45a4-b7a4-c38c9f2f9bac', request_counts=BatchRequestCounts(completed=270, failed=0, total=5000))
Batch(id='batch_40d32e10-a5a9-4c8c-93fc-03179afcfe78', completion_window='24h', created_at=1742002931, endpoint='/chat/completions', input_file_id='file-f1f64a0a175044e180afc9e9396d67d9', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1742003640, error_file_id='file-6d878431-9fca-44f8-a849-6972505d4d4a', errors=None, expired_at=None, expires_at=1742089331, failed_at=None, finalizing_at=1742003528, in_progress_at=1742003278, metadata=None, output_file_id='file-6f8cd787-c3b2-41bf-b7ef-0eba738899e8', request_counts=BatchRequestCounts(completed=277, failed=0, total=5000))
...
Batch(id='batch_23204b19-99e7-4ac3-8dd1-a70929280323', completion_window='24h', created_at=1741826257, endpoint='/chat/completions', input_file_id='file-3e06cf290fb24b0592154510e54b8809', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1741828981, error_file_id='file-da5bbdd7-6872-4379-b0fb-4a0376b6e6cc', errors=None, expired_at=None, expires_at=1741912657, failed_at=None, finalizing_at=1741828830, in_progress_at=1741828482, metadata=None, output_file_id='file-95378806-a0bc-4d80-a27a-eb3171ad9f70', request_counts=BatchRequestCounts(completed=5243, failed=0, total=5243))

I uploaded the batch script with Python, here is the pseudocode:

import os
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT") if endpoint is None else endpoint,
    api_key=os.getenv("AZURE_OPENAI_API_KEY") if api_key is None else api_key,
    api_version=os.getenv("AZURE_OPENAI_API_VERSION") if api_version is None else api_version,
)

file = client.files.create(
    file=open(final_filepath, "rb"), 
    purpose="batch"
)
file_id = file.id

batch_response = client.batches.create(
     input_file_id=file_id,
     endpoint="/chat/completions",
     completion_window="24h"
)
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,101 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Chakaravarthi Rangarajan Bhargavi 1,115 Reputation points MVP
    2025-03-16T08:12:20.6333333+00:00

    Hi Nandan Thakur,

    Welcome to the Microsoft Q&A Forum! Thanks for your question.

    The issue you're facing, where your Azure OpenAI Batch Job completes prematurely without processing all rows and without any errors, can stem from multiple factors. Below are potential causes and solutions to resolve the issue effectively.

    There are few possible Causes & Solutions which i had discussed below

    Structured Outputs Compatibility

    • Ensure that structured outputs are supported by your deployment model.
    • If you are using Global Batch, structured outputs are only supported with API version 2024-08-01-preview or later.
    • Using an unsupported API version may cause incomplete processing without explicit errors.

    Solution: Update your API version to 2024-08-01-preview or later.

    Reference: How to use Global Batch Processing with Azure OpenAI

    Batch Size and Rate Limits

    • Even though your rate limits are high, Azure OpenAI may still enforce hidden limits on batch processing.
    • If a batch job exceeds an optimal processing threshold, Azure OpenAI may truncate processing without failure messages.
    • Large batch jobs can also trigger internal timeouts before all rows are processed.

    Solution: Reduce the batch size and split large datasets into smaller batches. Ensure no individual batch exceeds system limitations.

    Reference: Azure Batch - Getting Started with JavaScript

    API Version Compatibility

    • Using an outdated or unsupported API version can cause unexpected behavior.
    • Certain API versions might lack full support for batch processing, causing the issue you're facing.

    Solution: Ensure you’re using the latest API version supported by your Azure OpenAI deployment. If using Azure Machine Learning, verify compatibility with AzureML API v2.

    Reference: Azure Machine Learning - Batch Model for OpenAI Embeddings

    Monitoring Batch Job Execution

    • Job monitoring is crucial to ensure every row is processed.
    • Azure does not always provide real-time error logs for batch jobs.
    • If rows are skipped or not processed, logging can help identify patterns.

    Solution: Enable logging for batch jobs. Use Azure Monitor & Application Insights to track execution. Run smaller test batches before scaling up. Reference: Azure Batch Job Task Error Checking

    Next Steps

    Please review these recommendations and adjust your batch job accordingly. If the issue persists, provide additional details on your deployment setup, and I'd be happy to assist further!

    Regards,

    Chakravarthi Rangarajan Bhargavi

    If this answer was helpful, kindly upvote and accept it to support the community. Thanks!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.