Hello @Rohit Shetty ,
Welcome to Microsoft Q&A .Thank you for reaching out to us.
The observed behavior aligns with how synchronous HTTP requests are handled in Azure App Service, particularly when combined with variable latency from external AI model calls such as Claude Sonnet.
In Azure App Service, HTTP requests are subject to a platform-enforced timeout of approximately 230–240 seconds for synchronous requests.
Key points:
- This behavior is caused by the Azure Load Balancer idle timeout
- If a request does not return a response within this duration, the connection is terminated
- This applies to the entire request-response lifecycle at the App Service front end
- This limit is a platform-level constraint and cannot be increased or overridden through configuration
This effectively acts as a hard upper bound for synchronous request-response operations.
Claude Sonnet response times can vary due to:
- Prompt size and complexity
- Output token length
- Model processing characteristics
- Backend execution variability
Because of this variability, certain requests may exceed the allowed request duration when executed synchronously, leading to timeout failures.
The issue arises from a mismatch between platform constraints and execution pattern:
- HTTP request layer - fixed maximum duration
- LLM inference - variable, sometimes long-running
When the Claude API call is executed within the same synchronous request, the request duration may exceed the allowed limit.
To confirm and measure the behavior:
- Enable Application Insights
- Track dependency duration for Claude API calls
- Correlate failed requests with latency spikes
- Validate execution flow Confirm whether the model call is synchronous within the request path
- Monitor key metrics:
- Request duration
- Dependency latency
- Failure patterns
The reliable resolution is to decouple model execution from the HTTP request lifecycle and adopt an asynchronous processing pattern.
Async request‑reply pattern
- Request is received by the App Service endpoint
- Immediate acknowledgment is returned (e.g., job ID)
- Task is placed into a queue
- Background worker processes the model request
- Result is stored and retrieved later
This ensures the HTTP request is no longer dependent on model execution time.
In summary ,
- The ~230–240 second timeout is a platform-enforced constraint in Azure App Service
- This limit cannot be increased for synchronous HTTP requests
- The issue is caused by an architecture mismatch between synchronous HTTP handling and unpredictable LLM execution time
It is recommended to
- Move from synchronous execution → asynchronous processing
- Decouple model inference from HTTP request lifecycle
- Use queue-based or orchestration-based architecture
This approach ensures:
- Stable request handling
- No timeout failures
- Scalable and resilient handling of variable-latency AI workloads
The following references might be helpful , please check them out
Please let us know if the response was helpful
Thank you