ResourceExhausted: No tokens available for passthrough (WebSocket Error 1013)

Shrawani Pampatwar 0 Reputation points
2025-11-17T20:36:59.5333333+00:00

We are encountering an intermittent issue where API requests fail with a Status(StatusCode="ResourceExhausted", Detail="no tokens available for passthrough") and a WebSocket error code of 1013. Please let us know the cause of this resource exhaustion and advise on steps to prevent it, or confirm if this indicates an issue on the Azure service side.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
{count} votes

1 answer

Sort by: Most helpful
  1. Sridhar M 2,995 Reputation points Microsoft External Staff Moderator
    2025-11-17T21:54:39.0333333+00:00

    Hi Shrawani Pampatwar

    Welcome to Microsoft Q&A and Thank you for reaching out

    The error Status(StatusCode="ResourceExhausted", Detail="no tokens available for passthrough") combined with WebSocket Error 1013 indicates that the Azure service could not allocate sufficient resources to process your request. This typically happens when the backend token pool or compute slots are temporarily exhausted. The WebSocket 1013 code is a standard signal meaning “Try Again Later.”

    Resource exhaustion usually occurs due to:

    • Traffic spikes or sudden bursts of concurrent requests.
    • Regional capacity constraints, especially during peak usage.
    • Requests involving large models or complex operations that consume more tokens.
    • Occasionally, service-side incidents can reduce available capacity even if your traffic is normal.

    To reduce of this error:

    • Smooth traffic flow: Avoid sending large batches at once; spread requests over time.
    • Connection reuse: Keep WebSocket connections alive for multiple requests instead of creating new ones repeatedly.
    • Quota and scaling: Request higher quotas in the Azure portal or distribute workloads across multiple regions.
    • Batch processing: For heavy workloads, use batch APIs instead of real-time endpoints.

    Implement robust error handling:

    • Use retry logic with exponential backoff for ResourceExhausted responses.
    • Log request IDs and timestamps for troubleshooting.
    • Monitor Azure Service Health to check for regional incidents before retrying aggressively

    Reference:

    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-quotas-and-limits

    https://learn.microsoft.com/en-us/azure/ai-foundry/openai/how-to/quota?tabs=rest

    I Hope this helps. Do let me know if you have any further queries.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.