Speech to text REST API

Article
09/23/2024

Speech to text REST API is used for batch transcription and custom speech.

Important

Speech to text REST API v3.2 is the latest version that's generally available. Preview versions 3.2-preview.1 and 3.2-preview.2* will be removed in September 2024. Speech to text REST API v3.1 will be retired on a date to be announced. For more information about upgrading, see the Speech to text REST API v3.1 to v3.2 migration guide. Speech to text REST API v3.0 will be retired on April 1st, 2026. For more information about upgrading, see the Speech to text REST API v3.0 to v3.1 and v3.1 to v3.2 migration guides.

See the Speech to text REST API 2024-05-15 reference documentation

See the Speech to text REST API v3.2 reference documentation

See the Speech to text REST API v3.1 reference documentation

Use Speech to text REST API to:

Fast transcription: Transcribe audio files with returning results synchronously and much faster than real-time audio. Use the fast transcription API (/speechtotext/transcriptions:transcribe) in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as quick audio or video transcription or video translation.
Custom speech: Upload your own data, test and train a custom model, compare accuracy between models, and deploy a model to a custom endpoint. Copy models to other subscriptions if you want colleagues to have access to a model that you built, or if you want to deploy a model to more than one region.
Batch transcription: Transcribe audio files as a batch from multiple URLs or an Azure container.

Speech to text REST API includes such features as:

Get logs for each endpoint if logs are requested for that endpoint.
Request the manifest of the models that you create, to set up on-premises containers.
Upload data from Azure storage accounts by using a shared access signature (SAS) URI.
Bring your own storage. Use your own storage accounts for logs, transcription files, and other data.
Some operations support webhook notifications. You can register your webhooks where notifications are sent.

Batch transcription

The following operation groups are applicable for batch transcription.

Operation group	Description
Models	Use base models or custom models to transcribe audio files. You can use models with custom speech and batch transcription. For example, you can use a model trained with a specific dataset to transcribe audio files. See Train a model and custom speech model lifecycle for examples of how to train and manage custom speech models.
Transcriptions	Use transcriptions to transcribe a large amount of audio in storage. When you use batch transcription you send multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe. See Create a transcription for examples of how to create a transcription from multiple audio files.
Web hooks	Use web hooks to receive notifications about creation, processing, completion, and deletion events. You can use web hooks with custom speech and batch transcription. Web hooks apply to datasets, endpoints, evaluations, models, and transcriptions.

Custom speech

The following operation groups are applicable for custom speech.

Operation group	Description
Datasets	Use datasets to train and test custom speech models. For example, you can compare the performance of a custom speech trained with a specific dataset to the performance of a base model or custom speech model trained with a different dataset. See Upload training and testing datasets for examples of how to upload datasets.
Endpoints	Deploy custom speech models to endpoints. You must deploy a custom endpoint to use a custom speech model. See Deploy a model for examples of how to manage deployment endpoints.
Evaluations	Use evaluations to compare the performance of different models. For example, you can compare the performance of a custom speech model trained with a specific dataset to the performance of a base model or a custom model trained with a different dataset. See test recognition quality and test accuracy for examples of how to test and evaluate custom speech models.
Models	Use base models or custom models to transcribe audio files. You can use models with custom speech and batch transcription. For example, you can use a model trained with a specific dataset to transcribe audio files. See Train a model and custom speech model lifecycle for examples of how to train and manage custom speech models.
Projects	Use projects to manage custom speech models, training and testing datasets, and deployment endpoints. Custom speech projects contain models, training and testing datasets, and deployment endpoints. Each project is specific to a locale. For example, you might create a project for English in the United States. See Create a project for examples of how to create projects.
Web hooks	Use web hooks to receive notifications about creation, processing, completion, and deletion events. You can use web hooks with custom speech and batch transcription. Web hooks apply to datasets, endpoints, evaluations, models, and transcriptions.

Service health

Service health provides insights about the overall health of the service and subcomponents. See Service Health for more information.

Share via

Speech to text REST API

Batch transcription

Custom speech

Service health

Next steps

Feedback

Additional resources