Share via

Claude Sonnet 4 6 API Response latency - Azure App Service

Rohit Shetty 5 Reputation points
2026-06-12T11:06:54.02+00:00

In the event when Claude Sonnet takes longer - App Service timesout and closes the connection, from what I read the 240 second is a hard limit on app service, is there any other workaround given Claude's response time is unpredictable for newer models.

Microsoft Foundry
Microsoft Foundry

A unified Azure platform for creating and managing AI models, agents, and applications with built‑in enterprise security, monitoring, and governance


2 answers

Sort by: Most helpful
  1. Pradhuman singh 0 Reputation points
    2026-06-15T11:12:12.91+00:00

    Yes. The recommended approach is not to keep a single HTTP request open for long-running AI generations.

    Common solutions are:

    Use streaming responses

    Stream tokens back to the client as they are generated.

    This keeps the connection active and improves user experience.

    Use asynchronous processing

    Submit the request.

    Return a job ID immediately.

    Process the model call in the background.

    Let the client poll for results or receive a callback when completed.

    Use Azure Functions / Durable Functions

    Durable Functions are designed for long-running workflows that exceed normal HTTP timeouts.

    Queue-based architecture

    Put requests into a queue (e.g., Azure Storage Queue or Service Bus).

    A worker processes the Claude request and stores the result.

    The client retrieves the completed result later.

    Reduce latency

    Lower max_tokens.

    Use streaming.

    Break very large prompts into smaller tasks.

    Key Point

    The Azure App Service request timeout is effectively a hard limit for a single HTTP request, so the usual solution is streaming or asynchronous/background processing, not increasing the timeout indefinitely. ✔️

    Was this answer helpful?

    0 comments No comments

  2. Karnam Venkata Rajeswari 3,920 Reputation points Microsoft External Staff Moderator
    2026-06-12T11:45:19.1866667+00:00

    Hello @Rohit Shetty ,

    Welcome to Microsoft Q&A .Thank you for reaching out to us.

    The observed behavior aligns with how synchronous HTTP requests are handled in Azure App Service, particularly when combined with variable latency from external AI model calls such as Claude Sonnet.

    In Azure App Service, HTTP requests are subject to a platform-enforced timeout of approximately 230–240 seconds for synchronous requests.

    Key points:

    • This behavior is caused by the Azure Load Balancer idle timeout
    • If a request does not return a response within this duration, the connection is terminated
    • This applies to the entire request-response lifecycle at the App Service front end
    • This limit is a platform-level constraint and cannot be increased or overridden through configuration

    This effectively acts as a hard upper bound for synchronous request-response operations.

    Claude Sonnet response times can vary due to:

    • Prompt size and complexity
    • Output token length
    • Model processing characteristics
    • Backend execution variability

    Because of this variability, certain requests may exceed the allowed request duration when executed synchronously, leading to timeout failures.

    The issue arises from a mismatch between platform constraints and execution pattern:

    • HTTP request layer - fixed maximum duration
    • LLM inference - variable, sometimes long-running

    When the Claude API call is executed within the same synchronous request, the request duration may exceed the allowed limit.

    To confirm and measure the behavior:

    1. Enable Application Insights
      1. Track dependency duration for Claude API calls
      2. Correlate failed requests with latency spikes
    2. Validate execution flow Confirm whether the model call is synchronous within the request path
    3. Monitor key metrics:
      1. Request duration
      2. Dependency latency
      3. Failure patterns

    The reliable resolution is to decouple model execution from the HTTP request lifecycle and adopt an asynchronous processing pattern.

    Async request‑reply pattern

    1. Request is received by the App Service endpoint
    2. Immediate acknowledgment is returned (e.g., job ID)
    3. Task is placed into a queue
    4. Background worker processes the model request
    5. Result is stored and retrieved later

    This ensures the HTTP request is no longer dependent on model execution time.

    In summary ,

    1. The ~230–240 second timeout is a platform-enforced constraint in Azure App Service
    2. This limit cannot be increased for synchronous HTTP requests
    3. The issue is caused by an architecture mismatch between synchronous HTTP handling and unpredictable LLM execution time

    It is recommended to

    1. Move from synchronous execution → asynchronous processing
    2. Decouple model inference from HTTP request lifecycle
    3. Use queue-based or orchestration-based architecture

    This approach ensures:

    • Stable request handling
    • No timeout failures
    • Scalable and resilient handling of variable-latency AI workloads

    The following references might be helpful , please check them out

     

    Please let us know if the response was helpful

     

    Thank you

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.