Share via

Could not finish the message because max_tokens or model output limit was reached.

Alex Cavanaugh 0 Reputation points
2026-02-26T22:13:52.0266667+00:00

400_Count

Starting around midnight Central Time on February 23, 2025, we began experiencing intermittent 400 errors for requests to GPT-5 across all US regions where we are deployed. The error rate increased throughout the day, peaked mid-afternoon, and then subsided around 6:00 PM CT without any changes on our end. We have seen a few brief spikes since then, but none as large or sustained as the initial incident. Here's the error message we saw:

Could not finish the message because max_tokens or model output limit was reached. Please try again with a higher max_tokens.

This was unexpected because:

  • Our prompts are relatively small. Total token consumption for affected requests was approximately 12k tokens, well within the model's context window limit.
  • We do not explicitly set the max_tokens parameter in our requests, so we are relying on the model default.
  • Retries on failed requests would typically succeed after a few attempts, suggesting this was not a consistent input or configuration issue.
  • The errors resolved on their own without any changes to our code or configuration.

Our questions for the Microsoft team:

  • What caused this behavior, and was there a known incident or platform issue on February 23?
  • Should we expect similar occurrences in the future?

Also note we did try setting max_completion_tokens on our requests to the largest value for GPT-5 (128,000) and this seemed to have no impact on the error rate.

Azure OpenAI Service
Azure OpenAI Service

An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.

{count} votes

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.