Share via

Sudden OpenAI errors on gpt-4o

Job Nijenhuis 20 Reputation points
2026-03-27T12:17:31.2+00:00

About 20% of our Azure OpenAI requests have failed with:

{'error': {'message': \"Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.\", 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': 'unsupported_parameter'}}

since 2AM CET today on about 20% of our requests. We use gpt-4o, which shouldn't even have this parameter, nor did we do any recent deployments that could influence this. How can we resolve this? I'm afraid switching from max_tokens to max_completion_tokens would potentially give issues for the 80% of requests that succeed...

Azure OpenAI Service
Azure OpenAI Service

An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.


Answer accepted by question author
  1. Karnam Venkata Rajeswari 1,655 Reputation points Microsoft External Staff Moderator
    2026-03-27T12:49:27.3033333+00:00

    Hi Job Nijenhuis,

    Welcome to Microsoft Q&A .Thank you for reaching out.

    When auto‑upgrade (or an equivalent model update policy) is enabled on a deployment, the underlying model version can change while the deployment name stays the same. During or after that change, requests may be processed by a newer model runtime that enforces stricter request validation rules.

    This behavior has been observed when a deployment that previously accepted the legacy parameter max_tokens starts serving a newer runtime that rejects max_tokens and requires max_completion_tokens instead. Requests that still include the legacy parameter fail, while others continue to succeed, resulting in partial failures

    Intermittent request failures are occurring because some requests are being processed by a newer model runtime while others are handled by an older runtime. The newer runtime no longer accepts the legacy request parameter max_tokens and instead requires max_completion_tokens. When a request containing max_tokens reaches the newer runtime, it is rejected, which explains why only a portion of requests fail.

    In addition, responses showing a different underlying model version indicate that the deployment is currently serving more than one backend model version during a transition phase.

    The error message indicates a strict parameter validation change in newer runtimes:

    • Older runtimes tolerate or ignore max_tokens
    • Newer runtimes reject max_tokens and require max_completion_tokens
    • Requests routed to the newer runtime fail when legacy parameters are present

    This behavior results in partial failures until request payloads are made compatible with the newer runtime.

    Please check if the following actions help in resolving the error

    1. Aligning the token parameters with the newer runtime

    Ensure that requests sent to GPT‑4o deployments do not include the legacy max_tokens parameter.

    • Replace max_tokens with max_completion_tokens for GPT‑4o
    • Avoid sending both parameters in the same request
    • If output length control is not strictly required, removing the token‑limit parameter entirely is also supported

     This approach prevents request rejection by newer runtimes while remaining compatible with current behavior.

    1. Verifying the deployment configuration

    Kindly review the deployment details in Azure AI Foundry to confirm which model version is currently backing the deployment.

    • Check whether the deployment has recently changed model versions
    • Confirm whether an upgrade policy or backend update is in effect

    Deployment behavior and supported parameters depend on the active model version.

    1. Using a consistent and supported API surface

    Use a single, consistent API surface for GPT‑4o workloads.

    • Prefer chat completions or the newer Responses API
    • Avoid mixing Completions and Chat endpoints for the same deployment
    • Use a supported, modern API version rather than older preview versions
    1. Checking client libraries and SDK behavior
    • Some older or mismatched SDK versions may automatically inject max_tokens even when not explicitly set.
    • Confirm the exact request payload sent over the wire
    • Pin SDK versions to avoid unplanned behavior changes
    • Ensure the SDK and API version are aligned with GPT‑4o support
    1. Monitoring request behaviour after changes
    • After aligning parameters and API usage:
    • Monitor error rates to confirm stabilization
    • Watch for changes in the reported model version in responses
    • Verify that all requests follow the same request pattern

    Once request payloads are aligned with the newer runtime requirements and legacy parameters are removed from GPT‑4o paths, the intermittent failures stop. This approach avoids disruption to successful requests while ensuring compatibility during backend transitions.

     

    References:

    Azure OpenAI in Microsoft Foundry Models working with models - Microsoft Foundry | Microsoft Learn

    Azure OpenAI in Microsoft Foundry Models REST API v1 preview reference - Microsoft Foundry | Microsoft Learn

    Work with chat completion models - Microsoft Foundry | Microsoft Learn

    Azure OpenAI in Microsoft Foundry Models REST API reference - Microsoft Foundry | Microsoft Learn

    Use the Azure OpenAI Responses API - Microsoft Foundry | Microsoft Learn

    https://learn.microsoft.com/en-us/azure/foundry/openai/concepts/model-retirements?tabs=text

    Thank you!

     

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

     

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Job Nijenhuis 20 Reputation points
    2026-03-27T13:21:58.8133333+00:00

    Apparently the model was auto-updated to 5.1 but we didn't notice. Solved


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.