Events
Mar 17, 9 PM - Mar 21, 10 AM
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
This article contains a quick reference and a detailed description of the quotas and limits for the Speech service in Azure AI services. The information applies to all pricing tiers of the service. It also contains some best practices to avoid request throttling.
For the free (F0) pricing tier, see also the monthly allowances at the pricing page.
The following sections provide you with a quick guide to the quotas and limits that apply to the Speech service.
For information about adjustable quotas for Standard (S0) Speech resources, see more explanations, best practices, and adjustment instructions. The quotas and limits for Free (F0) Speech resources aren't adjustable.
Important
If you switch an AI Services resource for Speech from Free (F0) to Standard (S0) pricing tier, the change of the corresponding quotas may take up to several hours.
This section describes speech to text quotas and limits per Speech resource. Unless otherwise specified, the limits aren't adjustable.
You can use real-time speech to text with the Speech SDK or the Speech to text REST API for short audio.
Important
These limits apply to concurrent real-time speech to text requests and speech translation requests combined. For example, if you have 60 concurrent speech to text requests and 40 concurrent speech translation requests, you'll reach the limit of 100 concurrent requests.
Quota | Free (F0) | Standard (S0) |
---|---|---|
Concurrent request limit - base model endpoint | 1 This limit isn't adjustable. |
100 (default value) The rate is adjustable for Standard (S0) resources. See more explanations, best practices, and adjustment instructions. |
Concurrent request limit - custom endpoint | 1 This limit isn't adjustable. |
100 (default value) The rate is adjustable for Standard (S0) resources. See more explanations, best practices, and adjustment instructions. |
Max audio length for real-time diarization. | N/A | 240 minutes per file |
Quota | Free (F0) | Standard (S0) |
---|---|---|
Maximum audio input file size | N/A | 200 MB |
Maximum audio length | N/A | 120 minutes per file |
Maximum requests per minute | N/A | 600 |
Quota | Free (F0) | Standard (S0) |
---|---|---|
Speech to text REST API limit | Not available for F0 | 100 requests per 10 seconds (600 requests per minute) |
Max audio input file size | N/A | 1 GB |
Max number of blobs per container | N/A | 10000 |
Max number of files per transcription request (when you're using multiple content URLs as input). | N/A | 1000 |
Max audio length for transcriptions with diarization enabled. | N/A | 240 minutes per file |
The limits in this table apply per Speech resource when you create a custom speech model.
Quota | Free (F0) | Standard (S0) |
---|---|---|
REST API limit | 100 requests per 10 seconds (600 requests per minute) | 100 requests per 10 seconds (600 requests per minute) |
Max number of custom model deployments per Speech resource | 1 | 50 |
Max number of speech datasets | 2 | 500 |
Max acoustic dataset file size for data import | 2 GB | 2 GB |
Max language dataset file size for data import | 200 MB | 1.5 GB |
Max pronunciation dataset file size for data import | 1 KB | 1 MB |
Max text size when you're using the text parameter in the Models_Create API request |
200 KB | 500 KB |
This section describes text to speech quotas and limits per Speech resource.
You can use real-time text to speech with the Speech SDK or the Text to speech REST API. Unless otherwise specified, the limits aren't adjustable.
Quota | Free (F0) | Standard (S0) |
---|---|---|
Maximum number of transactions per time period for prebuilt neural voices and custom neural voices. | 20 transactions per 60 seconds This limit isn't adjustable. |
200 transactions per second (TPS) (default value) The rate is adjustable up to 1000 TPS for Standard (S0) resources. See more explanations, best practices, and adjustment instructions. |
Max audio length produced per request | 10 min | 10 min |
Max total number of distinct <voice> and <audio> tags in SSML |
50 | 50 |
Max SSML message size per turn for websocket | 64 KB | 64 KB |
These limits aren't adjustable. For more information on batch synthesis latency, see the batch synthesis latency and best practices.
Quota | Free (F0) | Standard (S0) |
---|---|---|
REST API limit | Not available for F0 | 100 requests per 10 seconds |
Max JSON payload size to create a synthesis job | N/A | 2 megabytes |
Concurrent active synthesis jobs | N/A | No limit |
Max number of text inputs per synthesis job | N/A | 10000 |
Max time to live for a synthesis job since it being in the final state | N/A | Up to 31 days (specified using properties) |
The limits in this table apply per Speech resource when you create a professional custom neural voice model.
Quota | Free (F0) | Standard (S0) |
---|---|---|
Max number of transactions per second (TPS) | Not available for F0 | 200 transactions per second (TPS) (default value) |
Max number of datasets | N/A | 500 |
Max number of simultaneous dataset uploads | N/A | 5 |
Max data file size for data import per dataset | N/A | 2 GB |
Upload of long audio or audio without script | N/A | Yes |
Max number of simultaneous model trainings | N/A | 4 |
Max number of custom endpoints | N/A | 50 |
The limits in this table apply per Speech resource when you create a personal voice.
Quota | Free (F0) | Standard (S0) |
---|---|---|
REST API limit (not including speech synthesis) | Not available for F0 | 50 requests per 10 seconds |
Max number of transactions per second (TPS) for speech synthesis | Not available for F0 | 200 transactions per second (TPS) (default value) |
Quota | Free (F0) | Standard (S0) |
---|---|---|
REST API limit | Not available for F0 | 2 requests per 1 minute |
Quota | Free (F0) | Standard (S0) |
---|---|---|
New connections per minute | Not available for F0 | 2 new connections per minute |
Max connection duration with speaking | Not available for F0 | 30 minutes1 |
Max connection duration with idle state | Not available for F0 | 5 minutes |
1 To ensure continuous operation of the real-time avatar for more than 30 minutes, you can enable auto-reconnect. For information about how to set up auto-reconnect, refer to this sample code (search "auto reconnect").
Quota | Free (F0) | Standard (S0) |
---|---|---|
File size (plain text in SSML)1 | 3,000 characters per file | 20,000 characters per file |
File size (lexicon file)2 | 30KB per file | 100KB per file |
Billable characters in SSML | 15,000 characters per file | 100,000 characters per file |
Export to audio library | 1 concurrent task | N/A |
1 The limit only applies to plain text in SSML and doesn't include tags.
2 The characters of lexicon file aren't charged. Only the lexicon elements in SSML are counted as billable characters. Refer to billable characters to learn more.
Speaker recognition is limited to 20 transactions per second (TPS).
Some of the Speech service quotas are adjustable. This section provides more explanations, best practices, and adjustment instructions.
The following quotas are adjustable for Standard (S0) resources. The Free (F0) request limits aren't adjustable.
Before requesting a quota increase (where applicable), check your current TPS (transactions per second) and ensure that it's necessary to increase the quota. Speech service uses autoscaling technologies to bring the required computational resources in on-demand mode. At the same time, Speech service tries to keep your costs low by not maintaining an excessive amount of hardware capacity.
Let's look at an example. Suppose that your application receives response code 429, which indicates that there are too many requests. Your application receives this response even though your workload is within the limits defined by the Quotas and limits reference. The most likely explanation is that Speech service is scaling up to your demand and didn't reach the required scale yet. Therefore the service doesn't immediately have enough resources to serve the request. In such cases, increasing the quota won’t help. In most cases, the Speech service will scale up soon, and the issue causing response code 429 will be resolved.
To minimize issues related to throttling, it's a good idea to use the following techniques:
The next sections describe specific cases of adjusting quotas.
By default, the number of concurrent real-time speech to text and speech translation requests combined is limited to 100 per resource in the base model, and 100 per custom endpoint in the custom model. For the standard pricing tier, you can increase this amount. Before submitting the request, ensure that you're familiar with the material discussed earlier in this article, such as the best practices to mitigate throttling.
Note
Concurrent request limits for base and custom models need to be adjusted separately. You can have a Speech service resource that's associated with many custom endpoints hosting many custom model deployments. As needed, the limit adjustments per custom endpoint must be requested separately.
Increasing the limit of concurrent requests doesn't directly affect your costs. The Speech service uses a payment model that requires that you pay only for what you use. The limit defines how high the service can scale before it starts throttle your requests.
You aren't able to see the existing value of the concurrent request limit parameter in the Azure portal, the command-line tools, or API requests. To verify the existing value, create an Azure support request.
Note
Speech containers don't require increases of the concurrent request limit, because containers are constrained only by the CPUs of the hardware they are hosted on. Speech containers do, however, have their own capacity limitations that should be taken into account. For more information, see the Speech containers FAQ.
How to get information for the base model:
How to get information for the custom model:
Initiate the increase of the limit for concurrent requests for your resource, or if necessary check the current limit, by submitting a support request. Here's how:
Here's a general example of a good approach to take. It's meant only as a template that you can adjust as necessary for your own use.
Suppose that a Speech service resource has the concurrent request limit set to 300. Start the workload from 20 concurrent connections, and increase the load by 20 concurrent connections every 90-120 seconds. Control the service responses, and implement the logic that falls back (reduces the load) if you get too many requests (response code 429). Then, retry the load increase in one minute, and if it still doesn't work, try again in two minutes. Use a pattern of 1-2-4-4 minutes for the intervals.
Generally, it's a good idea to test the workload and the workload patterns before going to production.
For the standard pricing tier, you can increase this amount. Before submitting the request, ensure that you're familiar with the material discussed earlier in this article, such as the best practices to mitigate throttling.
Increasing the limit of concurrent requests doesn't directly affect your costs. Speech service uses a payment model that requires that you pay only for what you use. The limit defines how high the service can scale before it starts throttle your requests.
You aren't able to see the existing value of the concurrent request limit parameter in the Azure portal, the command-line tools, or API requests. To verify the existing value, create an Azure support request.
Note
Speech containers don't require increases of the concurrent request limit, because containers are constrained only by the CPUs of the hardware they are hosted on.
To create an increase request, you need to provide your information.
How to get information for the prebuilt voice:
How to get information for the custom voice:
Initiate the increase of the limit for concurrent requests for your resource, or if necessary check the current limit, by submitting a support request. Here's how:
To increase the limit of new connections per minute for text to speech avatar, contact your sales representative to create a ticket with the following information:
Events
Mar 17, 9 PM - Mar 21, 10 AM
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowTraining
Module
Erstellen sprachaktivierter Apps mit Azure KI Services - Training
Erstellen sprachfähiger Apps mit Azure KI Services
Documentation
Regionen für den Speech-Dienst - Azure AI services
Eine Liste der verfügbaren Regionen und Endpunkte für den Speech-Dienst, einschließlich Spracherkennung, Sprachsynthese und Sprachübersetzung.
Übersicht über Speech Studio: Speech-Dienst - Azure AI services
Speech Studio besteht aus einer Reihe von benutzeroberflächenbasierten Tools zum Erstellen und Integrieren von Features aus dem Speech-Dienst in Ihre Anwendungen.
Worum handelt es sich beim Speech-Dienst? - Azure AI services
Der Speech-Dienst bietet mit einer Azure Ressource Funktionen für die Spracherkennung (Sprache-in-Text), Sprachsynthese (Text-zu-Sprache) und Sprachübersetzung. Über das Speech SDK, das Speech Studio oder REST-APIs können Sie Ihren Anwendungen, Tools und Geräten Sprachfunktionen hinzufügen.