Speech to text REST API

Speech to text REST API is used for Batch transcription and Custom Speech.

Important

Speech to text REST API v3.2 is available in preview. Speech to text REST API v3.1 is generally available. Speech to text REST API v3.0 will be retired on April 1st, 2026. For more information, see the Speech to text REST API v3.0 to v3.1 and v3.1 to v3.2 migration guides.

Use Speech to text REST API to:

  • Custom Speech: With Custom Speech, you can upload your own data, test and train a custom model, compare accuracy between models, and deploy a model to a custom endpoint. Copy models to other subscriptions if you want colleagues to have access to a model that you built, or if you want to deploy a model to more than one region.
  • Batch transcription: Transcribe audio files as a batch from multiple URLs or an Azure container.

Speech to text REST API includes such features as:

  • Get logs for each endpoint if logs have been requested for that endpoint.
  • Request the manifest of the models that you create, to set up on-premises containers.
  • Upload data from Azure storage accounts by using a shared access signature (SAS) URI.
  • Bring your own storage. Use your own storage accounts for logs, transcription files, and other data.
  • Some operations support webhook notifications. You can register your webhooks where notifications are sent.

Datasets

Datasets are applicable for Custom Speech. You can use datasets to train and test the performance of different models. For example, you can compare the performance of a model trained with a specific dataset to the performance of a model trained with a different dataset.

See Upload training and testing datasets for examples of how to upload datasets. This table includes all the operations that you can perform on datasets.

Path Method Version 3.1 Version 3.0
/datasets GET Datasets_List GetDatasets
/datasets POST Datasets_Create CreateDataset
/datasets/{id} DELETE Datasets_Delete DeleteDataset
/datasets/{id} GET Datasets_Get GetDataset
/datasets/{id} PATCH Datasets_Update UpdateDataset
/datasets/{id}/blocks:commit POST Datasets_CommitBlocks Not applicable
/datasets/{id}/blocks GET Datasets_GetBlocks Not applicable
/datasets/{id}/blocks PUT Datasets_UploadBlock Not applicable
/datasets/{id}/files GET Datasets_ListFiles GetDatasetFiles
/datasets/{id}/files/{fileId} GET Datasets_GetFile GetDatasetFile
/datasets/locales GET Datasets_ListSupportedLocales GetSupportedLocalesForDatasets
/datasets/upload POST Datasets_Upload UploadDatasetFromForm

Endpoints

Endpoints are applicable for Custom Speech. You must deploy a custom endpoint to use a Custom Speech model.

See Deploy a model for examples of how to manage deployment endpoints. This table includes all the operations that you can perform on endpoints.

Path Method Version 3.1 Version 3.0
/endpoints GET Endpoints_List GetEndpoints
/endpoints POST Endpoints_Create CreateEndpoint
/endpoints/{id} DELETE Endpoints_Delete DeleteEndpoint
/endpoints/{id} GET Endpoints_Get GetEndpoint
/endpoints/{id} PATCH Endpoints_Update UpdateEndpoint
/endpoints/{id}/files/logs DELETE Endpoints_DeleteLogs DeleteEndpointLogs
/endpoints/{id}/files/logs GET Endpoints_ListLogs GetEndpointLogs
/endpoints/{id}/files/logs/{logId} DELETE Endpoints_DeleteLog DeleteEndpointLog
/endpoints/{id}/files/logs/{logId} GET Endpoints_GetLog GetEndpointLog
/endpoints/base/{locale}/files/logs DELETE Endpoints_DeleteBaseModelLogs DeleteBaseModelLogs
/endpoints/base/{locale}/files/logs GET Endpoints_ListBaseModelLogs GetBaseModelLogs
/endpoints/base/{locale}/files/logs/{logId} DELETE Endpoints_DeleteBaseModelLog DeleteBaseModelLog
/endpoints/base/{locale}/files/logs/{logId} GET Endpoints_GetBaseModelLog GetBaseModelLog
/endpoints/locales GET Endpoints_ListSupportedLocales GetSupportedLocalesForEndpoints

Evaluations

Evaluations are applicable for Custom Speech. You can use evaluations to compare the performance of different models. For example, you can compare the performance of a model trained with a specific dataset to the performance of a model trained with a different dataset.

See Test recognition quality and Test accuracy for examples of how to test and evaluate Custom Speech models. This table includes all the operations that you can perform on evaluations.

Path Method Version 3.1 Version 3.0
/evaluations GET Evaluations_List GetEvaluations
/evaluations POST Evaluations_Create CreateEvaluation
/evaluations/{id} DELETE Evaluations_Delete DeleteEvaluation
/evaluations/{id} GET Evaluations_Get GetEvaluation
/evaluations/{id} PATCH Evaluations_Update UpdateEvaluation
/evaluations/{id}/files GET Evaluations_ListFiles GetEvaluationFiles
/evaluations/{id}/files/{fileId} GET Evaluations_GetFile GetEvaluationFile
/evaluations/locales GET Evaluations_ListSupportedLocales GetSupportedLocalesForEvaluations

Health status

Health status provides insights about the overall health of the service and sub-components.

Path Method Version 3.1 Version 3.0
/healthstatus GET HealthStatus_Get GetHealthStatus

Models

Models are applicable for Custom Speech and Batch Transcription. You can use models to transcribe audio files. For example, you can use a model trained with a specific dataset to transcribe audio files.

See Train a model and Custom Speech model lifecycle for examples of how to train and manage Custom Speech models. This table includes all the operations that you can perform on models.

Path Method Version 3.1 Version 3.0
/models GET Models_ListCustomModels GetModels
/models POST Models_Create CreateModel
/models/{id}:copyto1 POST Models_CopyTo CopyModelToSubscription
/models/{id} DELETE Models_Delete DeleteModel
/models/{id} GET Models_GetCustomModel GetModel
/models/{id} PATCH Models_Update UpdateModel
/models/{id}/files GET Models_ListFiles Not applicable
/models/{id}/files/{fileId} GET Models_GetFile Not applicable
/models/{id}/manifest GET Models_GetCustomModelManifest GetModelManifest
/models/base GET Models_ListBaseModels GetBaseModels
/models/base/{id} GET Models_GetBaseModel GetBaseModel
/models/base/{id}/manifest GET Models_GetBaseModelManifest GetBaseModelManifest
/models/locales GET Models_ListSupportedLocales GetSupportedLocalesForModels

Projects

Projects are applicable for Custom Speech. Custom Speech projects contain models, training and testing datasets, and deployment endpoints. Each project is specific to a locale. For example, you might create a project for English in the United States.

See Create a project for examples of how to create projects. This table includes all the operations that you can perform on projects.

Path Method Version 3.1 Version 3.0
/projects GET Projects_List GetProjects
/projects POST Projects_Create CreateProject
/projects/{id} DELETE Projects_Delete DeleteProject
/projects/{id} GET Projects_Get GetProject
/projects/{id} PATCH Projects_Update UpdateProject
/projects/{id}/datasets GET Projects_ListDatasets GetDatasetsForProject
/projects/{id}/endpoints GET Projects_ListEndpoints GetEndpointsForProject
/projects/{id}/evaluations GET Projects_ListEvaluations GetEvaluationsForProject
/projects/{id}/models GET Projects_ListModels GetModelsForProject
/projects/{id}/transcriptions GET Projects_ListTranscriptions GetTranscriptionsForProject
/projects/locales GET Projects_ListSupportedLocales GetSupportedProjectLocales

Transcriptions

Transcriptions are applicable for Batch Transcription. Batch transcription is used to transcribe a large amount of audio in storage. You should send multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe.

See Create a transcription for examples of how to create a transcription from multiple audio files. This table includes all the operations that you can perform on transcriptions.

Path Method Version 3.1 Version 3.0
/transcriptions GET Transcriptions_List GetTranscriptions
/transcriptions POST Transcriptions_Create CreateTranscription
/transcriptions/{id} DELETE Transcriptions_Delete DeleteTranscription
/transcriptions/{id} GET Transcriptions_Get GetTranscription
/transcriptions/{id} PATCH Transcriptions_Update UpdateTranscription
/transcriptions/{id}/files GET Transcriptions_ListFiles GetTranscriptionFiles
/transcriptions/{id}/files/{fileId} GET Transcriptions_GetFile GetTranscriptionFile
/transcriptions/locales GET Transcriptions_ListSupportedLocales GetSupportedLocalesForTranscriptions

Web hooks

Web hooks are applicable for Custom Speech and Batch Transcription. In particular, web hooks apply to datasets, endpoints, evaluations, models, and transcriptions. Web hooks can be used to receive notifications about creation, processing, completion, and deletion events.

This table includes all the web hook operations that are available with the Speech to text REST API.

Path Method Version 3.1 Version 3.0
/webhooks GET WebHooks_List GetHooks
/webhooks POST WebHooks_Create CreateHook
/webhooks/{id}:ping1 POST WebHooks_Ping PingHook
/webhooks/{id}:test2 POST WebHooks_Test TestHook
/webhooks/{id} DELETE WebHooks_Delete DeleteHook
/webhooks/{id} GET WebHooks_Get GetHook
/webhooks/{id} PATCH WebHooks_Update UpdateHook

1 The /webhooks/{id}/ping operation (includes '/') in version 3.0 is replaced by the /webhooks/{id}:ping operation (includes ':') in version 3.1.

2 The /webhooks/{id}/test operation (includes '/') in version 3.0 is replaced by the /webhooks/{id}:test operation (includes ':') in version 3.1.

Next steps