An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Could not finish the message because max_tokens or model output limit was reached.
Starting around midnight Central Time on February 23, 2025, we began experiencing intermittent 400 errors for requests to GPT-5 across all US regions where we are deployed. The error rate increased throughout the day, peaked mid-afternoon, and then subsided around 6:00 PM CT without any changes on our end. We have seen a few brief spikes since then, but none as large or sustained as the initial incident. Here's the error message we saw:
Could not finish the message because max_tokens or model output limit was reached. Please try again with a higher max_tokens.
This was unexpected because:
- Our prompts are relatively small. Total token consumption for affected requests was approximately 12k tokens, well within the model's context window limit.
- We do not explicitly set the
max_tokensparameter in our requests, so we are relying on the model default. - Retries on failed requests would typically succeed after a few attempts, suggesting this was not a consistent input or configuration issue.
- The errors resolved on their own without any changes to our code or configuration.
Our questions for the Microsoft team:
- What caused this behavior, and was there a known incident or platform issue on February 23?
- Should we expect similar occurrences in the future?
Also note we did try setting max_completion_tokens on our requests to the largest value for GPT-5 (128,000) and this seemed to have no impact on the error rate.