Strange cyclical pattern in output tokens with Python OpenAI batch submissions, why are my output_tokens increasing with the index of the submission, when there is no such pattern in input size?

Pedro 40 Reputation points
2025-12-02T18:24:56.19+00:00

Hi everyone,

I am encountering a strange issue when using Python for batch submissions (Through the Azure API). I’ve noticed a cyclical pattern where the number of output tokens seems to correlate with the index of the text, rather than the input size.

In python, I create a list of dictionaries which then gets converted to jsonl. I then submit this to the client. My intention is that each note is completely independent of the rest.

The Issue: Specifically, the number of output tokens steadily increases as the index of the text increases. This happens for around 100 notes, at which point the token count sharply drops, and the pattern repeats. (when plotted this looks like the blade of a saw).

Troubleshooting: I initially thought this might be due to how my data was sorted, but I have confirmed that the pattern persists even after randomly shuffling the list of texts before submission. I also confirmed that there is no bug in my creation of the list of dictionaries. The issue is not due to the size of the inputs per text having a saw pattern.

Has anyone seen this behavior before or have any ideas on what might be causing this “reset” every 100 requests?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
{count} votes

Answer accepted by question author
  1. Sridhar M 2,525 Reputation points Microsoft External Staff Moderator
    2025-12-03T15:52:58.7433333+00:00

    Hi Pedro

    Batch jobs are asynchronous and unordered. The service runs each JSONL line independently and completes results in whatever order workers finish; the final output JSONL is not guaranteed to match your input order. You should treat the output as an unordered set and join back on custom_id.

    • When you plot output_tokens against the line number in the returned file, you’ll often see trends (e.g., increasing tokens then a reset) because you’re effectively plotting worker completion order, not your original sequence.
    • As you already observed, reordering results back to your original input order (via custom_id) makes the pattern disappear—that’s the correct approach

    Why temperature, top_p, and seed error out on GPT‑5 / o‑series:

    • Reasoning models (GPT‑5 family, o1/o3/o4‑mini) enforce a restricted parameter set. They do not accept typical sampling controls like temperature/top_p (and seed in most SDKs), which is why you see: 400 BadRequestError Unsupported parameter: 'temperature' is not supported with this model.

    This restriction applies both to synchronous calls (Responses / Chat Completions) and to Batch. It’s not a batch-only limitation; it’s a model capability rule.

    • Use the knobs these models support:
      • Chat Completions: max_completion_tokens
      • Responses API: max_output_tokens
      • Reasoning effort: reasoning_effort (where applicable)
      • Leave temperature, top_p, seed out for o‑series / GPT‑5. “Shards” & batches of ~100: what you can (and can’t) control:
    • Internally, the service parallelizes work in chunks; you may notice output arriving in blocks (dozens/100s). There is no public parameter to “set shard size” or “change number of shards”.
    • configurable:
    • Delivery window (e.g., 24h)
    • Quota/backoff (exponential backoff for very large jobs)
    • Input/output storage (Azure Blob integration)
    • The blocky arrival of results can visually mimic periodic “resets”. That’s a processing artifact, not content leakage across requests.
    1. Each JSONL line is an independent request to the same model/deployment.
    2. No shared context across lines. One row’s prompt/completion doesn’t affect another row’s limits or content.
    3. Requirement: All lines target the same model/deployment and endpoint in the batch file.
    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Pedro 40 Reputation points
    2025-12-03T17:54:12.9466667+00:00

    this answers my question, thank you!


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.