Strange cyclical pattern in output tokens with Python OpenAI batch submissions, why are my output_tokens increasing with the index of the submission, when there is no such pattern in input size?

Question

Strange cyclical pattern in output tokens with Python OpenAI batch submissions, why are my output_tokens increasing with the index of the submission, when there is no such pattern in input size?

Pedro 40

Hi everyone,

I am encountering a strange issue when using Python for batch submissions (Through the Azure API). I’ve noticed a cyclical pattern where the number of output tokens seems to correlate with the index of the text, rather than the input size.

In python, I create a list of dictionaries which then gets converted to jsonl. I then submit this to the client. My intention is that each note is completely independent of the rest.

The Issue: Specifically, the number of output tokens steadily increases as the index of the text increases. This happens for around 100 notes, at which point the token count sharply drops, and the pattern repeats. (when plotted this looks like the blade of a saw).

Troubleshooting: I initially thought this might be due to how my data was sorted, but I have confirmed that the pattern persists even after randomly shuffling the list of texts before submission. I also confirmed that there is no bug in my creation of the list of dictionaries. The issue is not due to the size of the inputs per text having a saw pattern.

Has anyone seen this behavior before or have any ideas on what might be causing this “reset” every 100 requests?

Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-12-02T19:13:43.78+00:00
Hi Pedro

Welcome to Microsoft Q&A and Thank you for your question.

You’re seeing a cyclical or “sawtooth” pattern in output_tokens when using Python to submit batch requests to Azure OpenAI. The pattern repeats roughly every 100 items, even though input sizes are random and independent. Shuffling the list didn’t help, and the issue isn’t caused by your JSONL creation logic.

This behavior often comes from batch processing internals rather than your data. Azure OpenAI splits large jobs into shards of fixed size (commonly ~100) for parallel execution. Each shard can have slightly different runtime states such as caching, RNG seeds, or stop-sequence handling which affects output length. When a new shard starts, the pattern resets, creating the sawtooth effect.

Prompt Caching: If many requests share identical prefixes, caching can alter generation behavior indirectly.

Stop Sequences or JSON Mode: Workers may handle stop tokens differently, causing drift in output length.

Truncation at max_output_tokens: If some shards hit token limits more often, lengths rise until reset.

Sampling Variance: RNG streams per worker can create shard-local trends.

How to Diagnose

Log index, completion_tokens, finish_reason, and any shard IDs if available.

Submit smaller batches (e.g., 50 items). If the reset happens at 50, it’s shard-related.

Add a unique nonce to each system prompt to disable caching and compare results.

Check if responses hit truncation (finish_reason="length").

Fixes and Workarounds

Disable caching: Add a unique field (e.g., UUID) in the system prompt.

Use strict JSON schema + deterministic decoding: Set temperature=0, top_p=1.

Adjust max_output_tokens: Increase by 20–30% if truncation occurs.

Reduce batch size: Keep shards small (≤50) for uniform behavior.

Fix seed (if supported): Makes RNG deterministic across shards.

I Hope this helps. Do let me know if you have any further queries.

Thank you!
Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-12-02T19:17:59.55+00:00

Hi Pedro

Did you get any chance to review the above response. Thank you!
Pedro 40 Reputation points

2025-12-03T15:16:29.9066667+00:00
Disabling caching does not change the behavior. The max_output_tokens is well above the number of tokens needed. I tried submitting a batch on only 50 notes, I still see an increasing number of output tokens as the index increases.

One thing I noticed though is that the saw shape seems to actually be present in the order of the text that was returned by azure, which is different than the order i submitted it. When I reorder the results to the original order, the saw shape more or less disappears. So it appears as if what is happening is that azure is returning the notes roughly in batches of 100 in order of reasoning tokens used?

followup questions:

I have tried setting temperature=0 and top_p=1, and I also tried setting a seed. however I get an error back saying that these models do not support these parameters. My understanding is that GPT-5 and o4-mini do support these parameters, so I am not sure why it is not working.

BadRequestError: Error code: 400 - {'error': {'message': "Unsupported parameter: 'temperature' is not supported with this model.", 'type': 'invalid_request_error', 'param': 'temperature', 'code': None}}

Are these parameters disabled for batch jobs?

How do I change the number of “shards”? I do not see any references to “shards” in my documentation nor the output that is returned from the batch processing.

my understanding is that in batch jobs, each row in the jsonl is processed independently, correct? So one should not affect the context or limits for another?

Answer accepted by question author

1 additional answer

Your answer

Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-12-02T19:17:59.55+00:00

Hi Pedro

Did you get any chance to review the above response. Thank you!
Pedro 40 Reputation points

2025-12-03T15:16:29.9066667+00:00

Disabling caching does not change the behavior. The max_output_tokens is well above the number of tokens needed. I tried submitting a batch on only 50 notes, I still see an increasing number of output tokens as the index increases.

One thing I noticed though is that the saw shape seems to actually be present in the order of the text that was returned by azure, which is different than the order i submitted it. When I reorder the results to the original order, the saw shape more or less disappears. So it appears as if what is happening is that azure is returning the notes roughly in batches of 100 in order of reasoning tokens used?

followup questions:

I have tried setting temperature=0 and top_p=1, and I also tried setting a seed. however I get an error back saying that these models do not support these parameters. My understanding is that GPT-5 and o4-mini do support these parameters, so I am not sure why it is not working.

BadRequestError: Error code: 400 - {'error': {'message': "Unsupported parameter: 'temperature' is not supported with this model.", 'type': 'invalid_request_error', 'param': 'temperature', 'code': None}}

Are these parameters disabled for batch jobs?

How do I change the number of “shards”? I do not see any references to “shards” in my documentation nor the output that is returned from the batch processing.

my understanding is that in batch jobs, each row in the jsonl is processed independently, correct? So one should not affect the context or limits for another?

Answer 1

Hi Pedro

Batch jobs are asynchronous and unordered. The service runs each JSONL line independently and completes results in whatever order workers finish; the final output JSONL is not guaranteed to match your input order. You should treat the output as an unordered set and join back on custom_id.

When you plot output_tokens against the line number in the returned file, you’ll often see trends (e.g., increasing tokens then a reset) because you’re effectively plotting worker completion order, not your original sequence.
As you already observed, reordering results back to your original input order (via custom_id) makes the pattern disappear—that’s the correct approach

Why temperature, top_p, and seed error out on GPT‑5 / o‑series:

Reasoning models (GPT‑5 family, o1/o3/o4‑mini) enforce a restricted parameter set. They do not accept typical sampling controls like temperature/top_p (and seed in most SDKs), which is why you see: 400 BadRequestError Unsupported parameter: 'temperature' is not supported with this model.

This restriction applies both to synchronous calls (Responses / Chat Completions) and to Batch. It’s not a batch-only limitation; it’s a model capability rule.

Use the knobs these models support:
- Chat Completions: max_completion_tokens
- Responses API: max_output_tokens
- Reasoning effort: reasoning_effort (where applicable)
- Leave temperature, top_p, seed out for o‑series / GPT‑5. “Shards” & batches of ~100: what you can (and can’t) control:
Internally, the service parallelizes work in chunks; you may notice output arriving in blocks (dozens/100s). There is no public parameter to “set shard size” or “change number of shards”.
configurable:
Delivery window (e.g., 24h)
Quota/backoff (exponential backoff for very large jobs)
Input/output storage (Azure Blob integration)
The blocky arrival of results can visually mimic periodic “resets”. That’s a processing artifact, not content leakage across requests.

Each JSONL line is an independent request to the same model/deployment.
No shared context across lines. One row’s prompt/completion doesn’t affect another row’s limits or content.
Requirement: All lines target the same model/deployment and endpoint in the batch file.

Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-12-03T18:03:38.0166667+00:00

Hi Pedro

Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful
Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-12-03T18:11:58.05+00:00

Hi Pedro

Following up to see if the below answer was helpful, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.

Thank you!

Answer 2

Pedro 40

this answers my question, thank you!

Sridhar M 2,525 Reputation points Microsoft External Staff Moderator

2025-12-03T18:03:48.4366667+00:00

Hi Pedro

Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful

Share via

Strange cyclical pattern in output tokens with Python OpenAI batch submissions, why are my output_tokens increasing with the index of the submission, when there is no such pattern in input size?

1 additional answer

Your answer