Azure OpenAI Model: gpt-4.1 context window exceeded with way less than 1M tokens

Question

Azure OpenAI Model: gpt-4.1 context window exceeded with way less than 1M tokens

Thiago Almeida 15

Hello!

I'm having trouble using a large context window with gpt-4.1.

gpt-4.1 is known for having a 1M token context window.

It is described as such in azure docs:
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=global-standard%2Cstandard-chat-completions#capabilities

When I try to send something like 300k tokens, I get the following error:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'Your input exceeds the context window of this model. Please adjust your input and try again.', 'type': 'invalid_request_error', 'param': 'input', 'code': 'context_length_exceeded'}}

Below is a sample code I'm using to send the message (python sdk):

response: Response = client.responses.create(
            model=AZURE_AI_OPENAI_DEPLOYMENT_NAME,
            max_output_tokens=32768,
            instructions="system prompt with around 2k tokens",
            input="large message with around 300k tokens",
            store=False,
        )

api_version of my client is 2025-04-01-preview.

I'd like to know what is the actual maximum context window of gpt-4.1 within Azure.

My model is deployed at West US 3 with Global Standard type and a rate limit around 2M tokens per minute. When this rate is exceeded, I get another message, so I don't think that's what's preventing me from using the full gpt-4.1 context window.

Thanks a lot for any clarification on this matter.

1 answer

Your answer

Answer 1

Hi Thiago Almeida,

Even though GPT-4.1 supports up to 1 million tokens, in Azure OpenAI, the full 1M-token context is only available on deployments of the gpt-4-1106-preview or newer models where explicitly enabled, and only for certain SKUs and regions.

Your West US 3 deployment and model configuration likely do not support the 1M context, even if you're using gpt-4.1. In most regions and configurations, the max context is still 128k tokens (some older ones are 32k).

Understanding GPT-4.1's Context Window in Azure OpenAI

While GPT-4.1 is advertised to support a context window of up to 1 million tokens, this capability is not universally available across all Azure OpenAI deployments. The actual context window limit can vary based on several factors:

1.Model Variant: Different variants of GPT-4.1 (e.g., gpt-4-1106-preview, gpt-4-32k) have different context window capacities. For instance, gpt-4-32k supports up to 32,768 tokens.

2.Deployment Region: Certain regions may not yet support the full 1 million token context window. It's essential to verify the capabilities available in your specific deployment region.

3.API Version: The API version used can influence the features and limitations accessible in your deployment.

In your case, deploying GPT-4.1 in the West US 3 region with the 2025-04-01-preview API version may not currently support the full 1 million token context window, which could explain the errors encountered when sending inputs around 300k tokens.

You can refer to the following official Azure documentation:

· Azure OpenAI Service Models: Provides an overview of available models and their capabilities.

o Azure OpenAI Models

· Quotas and Limits: Details the quotas and limitations associated with Azure OpenAI services.

o Azure OpenAI Quotas and Limits

· Transparency Note: Offers insights into the limitations and considerations when using GPT-4.1.

o Azure OpenAI Transparency Note

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

**

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Thank you!

Thiago Almeida 15 Reputation points

2025-06-04T18:18:48.7233333+00:00

Thanks for the response, @Prashanth Veeragoni .

However, I didn't find in any of the documentation provided the limitations you are referring to.

And the models you are referencing are from the GPT-4 series, not the GPT-4.1 series.

As far as I can tell, the documentation provided explicitly says that GPT-4.1 (2025-04-14) (which is the one I'm using) supports a 1M tokens context window.

My quotas and limits support way more than 1M tokens.

If you can point me exactly to what region and deployment have 1M tokens supported, I'd be happy to test it out.

Thanks!
Prashanth Veeragoni 4,930 Reputation points Microsoft External Staff Moderator

2025-06-05T12:29:15.7133333+00:00

Hi Thiago Almeida,

There is an internal ticket raised on this issue, product team is working on this will let you know once the issue gets mitigated.

Thank you!
Thiago Almeida 15 Reputation points

2025-06-05T16:36:53.5566667+00:00

Thanks you, @Prashanth Veeragoni .

If you need any more details, just let me know.
Thiago Almeida 15 Reputation points

2025-06-09T16:50:57.72+00:00

@Prashanth Veeragoni
Is there anyway I can track the internal ticket raised?

Any id or anything I could use in other contacts with microsoft?

Share via

Azure OpenAI Model: gpt-4.1 context window exceeded with way less than 1M tokens

1 answer

Your answer