Azure OpenAI Model: gpt-4.1 context window exceeded with way less than 1M tokens

Thiago Almeida 15 Reputation points
2025-06-03T19:50:37.7033333+00:00

Hello!

I'm having trouble using a large context window with gpt-4.1.

gpt-4.1 is known for having a 1M token context window.

It is described as such in azure docs:
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#capabilities

When I try to send something like 300k tokens, I get the following error:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.', 'type': 'invalid_request_error', 'param': 'input', 'code': 'context_length_exceeded'}}

Below is a sample code I'm using to send the message (python sdk):

response: Response = client.responses.create(
            model=AZURE_AI_OPENAI_DEPLOYMENT_NAME,
            max_output_tokens=32768,
            instructions="system prompt with around 2k tokens",
            input="large message with around 300k tokens",
            store=False,
        )

api_version of my client is 2025-04-01-preview.

I'd like to know what is the actual maximum context window of gpt-4.1 within Azure.

My model is deployed at West US 3 with Global Standard type and a rate limit around 2M tokens per minute. When this rate is exceeded, I get another message, so I don't think that's what's preventing me from using the full gpt-4.1 context window.

Thanks a lot for any clarification on this matter.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,080 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Prashanth Veeragoni 4,930 Reputation points Microsoft External Staff Moderator
    2025-06-04T09:27:57.7666667+00:00

    Hi Thiago Almeida,

    Even though GPT-4.1 supports up to 1 million tokens, in Azure OpenAI, the full 1M-token context is only available on deployments of the gpt-4-1106-preview or newer models where explicitly enabled, and only for certain SKUs and regions.

    Your West US 3 deployment and model configuration likely do not support the 1M context, even if you're using gpt-4.1. In most regions and configurations, the max context is still 128k tokens (some older ones are 32k).

    Understanding GPT-4.1's Context Window in Azure OpenAI

    While GPT-4.1 is advertised to support a context window of up to 1 million tokens, this capability is not universally available across all Azure OpenAI deployments. The actual context window limit can vary based on several factors:

    1.Model Variant: Different variants of GPT-4.1 (e.g., gpt-4-1106-preview, gpt-4-32k) have different context window capacities. For instance, gpt-4-32k supports up to 32,768 tokens.

    2.Deployment Region: Certain regions may not yet support the full 1 million token context window. It's essential to verify the capabilities available in your specific deployment region.

    3.API Version: The API version used can influence the features and limitations accessible in your deployment.

    In your case, deploying GPT-4.1 in the West US 3 region with the 2025-04-01-preview API version may not currently support the full 1 million token context window, which could explain the errors encountered when sending inputs around 300k tokens.

    You can refer to the following official Azure documentation:

    ·   Azure OpenAI Service Models: Provides an overview of available models and their capabilities.

    o   Azure OpenAI Models

    ·   Quotas and Limits: Details the quotas and limitations associated with Azure OpenAI services.

    o   Azure OpenAI Quotas and Limits

    ·   Transparency Note: Offers insights into the limitations and considerations when using GPT-4.1.

    o   Azure OpenAI Transparency Note

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    **

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    Thank you! 


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.