Claude Sonnet 4 6 API Response latency - Azure App Service

Question

Claude Sonnet 4 6 API Response latency - Azure App Service

Rohit Shetty 5

In the event when Claude Sonnet takes longer - App Service timesout and closes the connection, from what I read the 240 second is a hard limit on app service, is there any other workaround given Claude's response time is unpredictable for newer models.

Karnam Venkata Rajeswari 3,920 Reputation points Microsoft External Staff Moderator

2026-06-15T11:10:42.7033333+00:00

Hello @Rohit Shetty ,

Following up to see if the response was helpful

Thank you
Karnam Venkata Rajeswari 3,920 Reputation points Microsoft External Staff Moderator

2026-06-16T09:57:02.4066667+00:00

Hello @Rohit Shetty ,

Checking in to see if you had any chance to review the above response

Please let us know if there are any further queries

Thank you
Karnam Venkata Rajeswari 3,920 Reputation points Microsoft External Staff Moderator

2026-06-17T17:46:42.1266667+00:00

Hello @Rohit Shetty ,

Hope the response was helpful

Since I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted with an upvote? This helps others in the community with the same question find the solution more easily.

Please let us know if you have any further queries.

Thank you!

2 answers

Your answer

Karnam Venkata Rajeswari 3,920 Reputation points Microsoft External Staff Moderator

2026-06-15T11:10:42.7033333+00:00

Hello @Rohit Shetty ,

Following up to see if the response was helpful

Thank you
Karnam Venkata Rajeswari 3,920 Reputation points Microsoft External Staff Moderator

2026-06-16T09:57:02.4066667+00:00

Hello @Rohit Shetty ,

Checking in to see if you had any chance to review the above response

Please let us know if there are any further queries

Thank you
Karnam Venkata Rajeswari 3,920 Reputation points Microsoft External Staff Moderator

2026-06-17T17:46:42.1266667+00:00

Hello @Rohit Shetty ,

Hope the response was helpful

Since I’ve converted my earlier comment into an answer, could you please take a moment to mark it as Accepted with an upvote? This helps others in the community with the same question find the solution more easily.

Please let us know if you have any further queries.

Thank you!

Answer 1

Yes. The recommended approach is not to keep a single HTTP request open for long-running AI generations.

Common solutions are:

Use streaming responses

Stream tokens back to the client as they are generated.

This keeps the connection active and improves user experience.

Use asynchronous processing

Submit the request.

Return a job ID immediately.

Process the model call in the background.

Let the client poll for results or receive a callback when completed.

Use Azure Functions / Durable Functions

Durable Functions are designed for long-running workflows that exceed normal HTTP timeouts.

Queue-based architecture

Put requests into a queue (e.g., Azure Storage Queue or Service Bus).

A worker processes the Claude request and stores the result.

The client retrieves the completed result later.

Reduce latency

Lower max_tokens.

Use streaming.

Break very large prompts into smaller tasks.

Key Point

The Azure App Service request timeout is effectively a hard limit for a single HTTP request, so the usual solution is streaming or asynchronous/background processing, not increasing the timeout indefinitely. ✔️

Answer 2

Hello @Rohit Shetty ,

Welcome to Microsoft Q&A .Thank you for reaching out to us.

The observed behavior aligns with how synchronous HTTP requests are handled in Azure App Service, particularly when combined with variable latency from external AI model calls such as Claude Sonnet.

In Azure App Service, HTTP requests are subject to a platform-enforced timeout of approximately 230–240 seconds for synchronous requests.

Key points:

This behavior is caused by the Azure Load Balancer idle timeout
If a request does not return a response within this duration, the connection is terminated
This applies to the entire request-response lifecycle at the App Service front end
This limit is a platform-level constraint and cannot be increased or overridden through configuration

This effectively acts as a hard upper bound for synchronous request-response operations.

Claude Sonnet response times can vary due to:

Prompt size and complexity
Output token length
Model processing characteristics
Backend execution variability

Because of this variability, certain requests may exceed the allowed request duration when executed synchronously, leading to timeout failures.

The issue arises from a mismatch between platform constraints and execution pattern:

HTTP request layer - fixed maximum duration
LLM inference - variable, sometimes long-running

When the Claude API call is executed within the same synchronous request, the request duration may exceed the allowed limit.

To confirm and measure the behavior:

Enable Application Insights
1. Track dependency duration for Claude API calls
2. Correlate failed requests with latency spikes
Validate execution flow Confirm whether the model call is synchronous within the request path
Monitor key metrics:
1. Request duration
2. Dependency latency
3. Failure patterns

The reliable resolution is to decouple model execution from the HTTP request lifecycle and adopt an asynchronous processing pattern.

Async request‑reply pattern

Request is received by the App Service endpoint
Immediate acknowledgment is returned (e.g., job ID)
Task is placed into a queue
Background worker processes the model request
Result is stored and retrieved later

This ensures the HTTP request is no longer dependent on model execution time.

In summary ,

The ~230–240 second timeout is a platform-enforced constraint in Azure App Service
This limit cannot be increased for synchronous HTTP requests
The issue is caused by an architecture mismatch between synchronous HTTP handling and unpredictable LLM execution time

It is recommended to

Move from synchronous execution → asynchronous processing
Decouple model inference from HTTP request lifecycle
Use queue-based or orchestration-based architecture

This approach ensures:

Stable request handling
No timeout failures
Scalable and resilient handling of variable-latency AI workloads

The following references might be helpful , please check them out

Please let us know if the response was helpful

Thank you

Share via

Claude Sonnet 4 6 API Response latency - Azure App Service

2 answers

Your answer