Welcome to Microsoft Q&A and Thank you for reaching out
The error Status(StatusCode="ResourceExhausted", Detail="no tokens available for passthrough") combined with WebSocket Error 1013 indicates that the Azure service could not allocate sufficient resources to process your request. This typically happens when the backend token pool or compute slots are temporarily exhausted. The WebSocket 1013 code is a standard signal meaning “Try Again Later.”
Resource exhaustion usually occurs due to:
- Traffic spikes or sudden bursts of concurrent requests.
- Regional capacity constraints, especially during peak usage.
- Requests involving large models or complex operations that consume more tokens.
- Occasionally, service-side incidents can reduce available capacity even if your traffic is normal.
To reduce of this error:
- Smooth traffic flow: Avoid sending large batches at once; spread requests over time.
- Connection reuse: Keep WebSocket connections alive for multiple requests instead of creating new ones repeatedly.
- Quota and scaling: Request higher quotas in the Azure portal or distribute workloads across multiple regions.
- Batch processing: For heavy workloads, use batch APIs instead of real-time endpoints.
Implement robust error handling:
- Use retry logic with exponential backoff for
ResourceExhaustedresponses. - Log request IDs and timestamps for troubleshooting.
- Monitor Azure Service Health to check for regional incidents before retrying aggressively
Reference:
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-quotas-and-limits
https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota?tabs=rest
I Hope this helps. Do let me know if you have any further queries.
Thank you!