Hi Pedro
Batch jobs are asynchronous and unordered. The service runs each JSONL line independently and completes results in whatever order workers finish; the final output JSONL is not guaranteed to match your input order. You should treat the output as an unordered set and join back on custom_id.
- When you plot
output_tokensagainst the line number in the returned file, you’ll often see trends (e.g., increasing tokens then a reset) because you’re effectively plotting worker completion order, not your original sequence. - As you already observed, reordering results back to your original input order (via
custom_id) makes the pattern disappear—that’s the correct approach
Why temperature, top_p, and seed error out on GPT‑5 / o‑series:
- Reasoning models (GPT‑5 family, o1/o3/o4‑mini) enforce a restricted parameter set. They do not accept typical sampling controls like
temperature/top_p(andseedin most SDKs), which is why you see: 400 BadRequestError Unsupported parameter: 'temperature' is not supported with this model.
This restriction applies both to synchronous calls (Responses / Chat Completions) and to Batch. It’s not a batch-only limitation; it’s a model capability rule.
- Use the knobs these models support:
- Chat Completions:
max_completion_tokens - Responses API:
max_output_tokens - Reasoning effort:
reasoning_effort(where applicable) - Leave
temperature,top_p,seedout for o‑series / GPT‑5. “Shards” & batches of ~100: what you can (and can’t) control:
- Chat Completions:
- Internally, the service parallelizes work in chunks; you may notice output arriving in blocks (dozens/100s). There is no public parameter to “set shard size” or “change number of shards”.
- configurable:
- Delivery window (e.g.,
24h) - Quota/backoff (exponential backoff for very large jobs)
- Input/output storage (Azure Blob integration)
- The blocky arrival of results can visually mimic periodic “resets”. That’s a processing artifact, not content leakage across requests.
- Each JSONL line is an independent request to the same model/deployment.
- No shared context across lines. One row’s prompt/completion doesn’t affect another row’s limits or content.
- Requirement: All lines target the same model/deployment and endpoint in the batch file.