Hi Dan Hastings,
Thank you for the detailed description—this is very helpful. Based on what you’ve shared, you’re seeing two overlapping behaviors:
- Intermittent timeouts / 503 errors across multiple models, and
Unexpected content filtering responses (422)
I’ve reviewed your observations against documented Azure behavior, and your concern is valid. Below is a consolidated explanation and guidance.
Timeouts, 503 Errors, and Model Instability
The error:
“upstream connect error or disconnect/reset before headers (503)”
typically indicates transient backend failures or capacity constraints, especially when:
- It occurs across multiple models
- It affects even simple prompts
Responses hang for long durations (e.g., 600 seconds)
What you can do
- Check Azure Service Health for any regional incidents or degradation
- Monitor metrics (latency, failures, throttling) in your Azure resource
- Validate quotas and request rates to ensure you’re within limits
- Implement retries with exponential backoff for 5xx/429 errors
- Test with different regions or GA models (preview models often have lower stability)
- Reduce payload size (prompt + max_tokens) to minimize processing time
These steps align with Microsoft best practices for handling transient faults and improving reliability.
Content Filtering (422 – content_filter)
The error:
“incomplete response (reason: content_filter)”
means the response was blocked or truncated by Azure AI Content Safety, which evaluates both prompts and outputs.
This can occasionally happen even with benign inputs (e.g., source code) due to how the model interprets patterns.
What you can do
- Inspect the content filter result in the API response to identify the triggered category
- Review your content filtering configuration (if using Foundry / Model Catalog)
Adjust severity thresholds or request modified filtering if your use case requires it
Important Clarification
Even though the above steps will help improve reliability:
- The pattern you’re seeing (timeouts + 503 across models) strongly suggests backend capacity or regional service instability, particularly for preview models
This is not always fully resolvable from the client side
The behavior you’re experiencing is consistent with transient service-side instability and content safety enforcement, rather than a single configuration issue.
The recommendations above will help:
- Improve resiliency (via retries and optimization)
- Reduce latency and failures
Provide better visibility into filtering behavior
Microsoft Reference Links:
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.
Please do not forget to "Accept Answer" and "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.