An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Hi Job Nijenhuis,
Welcome to Microsoft Q&A .Thank you for reaching out.
When auto‑upgrade (or an equivalent model update policy) is enabled on a deployment, the underlying model version can change while the deployment name stays the same. During or after that change, requests may be processed by a newer model runtime that enforces stricter request validation rules.
This behavior has been observed when a deployment that previously accepted the legacy parameter max_tokens starts serving a newer runtime that rejects max_tokens and requires max_completion_tokens instead. Requests that still include the legacy parameter fail, while others continue to succeed, resulting in partial failures
Intermittent request failures are occurring because some requests are being processed by a newer model runtime while others are handled by an older runtime. The newer runtime no longer accepts the legacy request parameter max_tokens and instead requires max_completion_tokens. When a request containing max_tokens reaches the newer runtime, it is rejected, which explains why only a portion of requests fail.
In addition, responses showing a different underlying model version indicate that the deployment is currently serving more than one backend model version during a transition phase.
The error message indicates a strict parameter validation change in newer runtimes:
- Older runtimes tolerate or ignore max_tokens
- Newer runtimes reject max_tokens and require max_completion_tokens
- Requests routed to the newer runtime fail when legacy parameters are present
This behavior results in partial failures until request payloads are made compatible with the newer runtime.
Please check if the following actions help in resolving the error
- Aligning the token parameters with the newer runtime
Ensure that requests sent to GPT‑4o deployments do not include the legacy max_tokens parameter.
- Replace max_tokens with max_completion_tokens for GPT‑4o
- Avoid sending both parameters in the same request
- If output length control is not strictly required, removing the token‑limit parameter entirely is also supported
This approach prevents request rejection by newer runtimes while remaining compatible with current behavior.
- Verifying the deployment configuration
Kindly review the deployment details in Azure AI Foundry to confirm which model version is currently backing the deployment.
- Check whether the deployment has recently changed model versions
- Confirm whether an upgrade policy or backend update is in effect
Deployment behavior and supported parameters depend on the active model version.
- Using a consistent and supported API surface
Use a single, consistent API surface for GPT‑4o workloads.
- Prefer chat completions or the newer Responses API
- Avoid mixing Completions and Chat endpoints for the same deployment
- Use a supported, modern API version rather than older preview versions
- Checking client libraries and SDK behavior
- Some older or mismatched SDK versions may automatically inject max_tokens even when not explicitly set.
- Confirm the exact request payload sent over the wire
- Pin SDK versions to avoid unplanned behavior changes
- Ensure the SDK and API version are aligned with GPT‑4o support
- Monitoring request behaviour after changes
- After aligning parameters and API usage:
- Monitor error rates to confirm stabilization
- Watch for changes in the reported model version in responses
- Verify that all requests follow the same request pattern
Once request payloads are aligned with the newer runtime requirements and legacy parameters are removed from GPT‑4o paths, the intermittent failures stop. This approach avoids disruption to successful requests while ensuring compatibility during backend transitions.
References:
Azure OpenAI in Microsoft Foundry Models working with models - Microsoft Foundry | Microsoft Learn
Work with chat completion models - Microsoft Foundry | Microsoft Learn
Azure OpenAI in Microsoft Foundry Models REST API reference - Microsoft Foundry | Microsoft Learn
Use the Azure OpenAI Responses API - Microsoft Foundry | Microsoft Learn
https://learn.microsoft.com/en-us/azure/foundry/openai/concepts/model-retirements?tabs=text
Thank you!
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.