Share via

OpenAI batch Job Error

Marvin Garcia 25 Reputation points
2025-09-02T05:27:54.56+00:00

I submitted a batch job to Azure OpenAI.

  • I set the max_tokens parameter to 10000 tokens for the entire each output
  • The batch consisted of 175 rows (requests as jsonl).

When I downloaded the batch output, I noticed that some outputs had garbled or empty-looking responses, such as:

"rrrr /n \r\n\n\n        \r\n\n\n        \r\n\n\n        \r\n\n\n"

Here’s is the response metadata for one of the requests:

"prompt_tokens": 309,
"total_tokens": 33077,
"status_code": 200,
"request_id": "883707c4-419d-497e-b5b1-84e3ee354c58"

The total token usage across all requests ended up being around 1.75 million Token, which is under the 2.03 million limit I configured.


❓ My Questions:

Why is the model returning malformed or blank-looking responses like rrrr /n \r\n\n\n when the request succeeded (status_code: 200)?

Will I be charged for these broken responses? For example:

Is there a cost for model "drifting" or failing silently like this?

  Are broken responses still billable if they used tokens?
  

with total_tokens: 33077 — which seems high for a single request, especially if only 309 tokens were in the prompt. The batch system does not throw an error for this row — it returns status_code: 200, which implies a successful completion.

Azure OpenAI in Foundry Models

1 answer

Sort by: Most helpful
  1. Anshika Varshney 13,320 Reputation points Microsoft External Staff Moderator
    2025-09-03T09:26:19.8233333+00:00

    Hello Marvin Garcia,

    Thank you for sharing the details. Let me clarify what is happening with your batch job.

    The garbled or blank-looking output you observed (for example, "rrrr /n \r\n\n\n") typically occurs when the model is asked to generate a very large number of tokens (max_tokens=10000). In such cases, the model may “drift” or produce repetitive filler text when it reaches the upper bound of generation. This behavior is not treated as an error at the API level, which is why you see status_code: 200.

    In your example, prompt_tokens was 309 but total_tokens was 33,077. This is because total_tokens includes both the prompt and all generated output tokens, even if those outputs are repetitive or appear blank. Thousands of newline or whitespace tokens still count toward billing, which explains why the token usage seems unexpectedly high.

    Regarding billing, yes you are charged for all tokens that the model processes or generates, even if the output is not useful. Azure OpenAI billing is based solely on token usage (prompt + completion), with no distinction between “good” and “bad” tokens. So, if a request consumes 33k tokens, those tokens are billable. The batch system returns a 200 response as long as the request was successfully processed by the model and produced a response payload. Errors are only thrown if the input exceeds limits, the job fails internally, or the service is unavailable. A malformed or repetitive response is still considered a valid completion.

    To reduce this issue, it’s best to set a more realistic max_tokens limit rather than always using 10,000 starting with smaller values such as 1,000-2,000 and increasing only if needed. You can also add post-processing validation to detect malformed or empty outputs and retry those rows with adjusted parameters. Prompt engineering techniques, such as instructing the model to “stop when the answer is complete” or using stop sequences, may help prevent drifting. Finally, make sure to monitor token usage in the Azure portal and configure cost alerts to avoid unexpected charges.

    Please find the attached document for your reference:

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.