What is batch transcription?

Batch transcription is used to transcribe a large amount of audio data in storage. Both the Speech-to-text REST API and Speech CLI support batch transcription.


To use batch transcription, you need a standard Speech resource (S0) in your subscription. Free resources (F0) aren't supported. For more information, see pricing and limits.

You should provide multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe. The batch transcription service can handle a large number of submitted transcriptions. The service transcribes the files concurrently, which reduces the turnaround time.

How does it work?

With batch transcriptions, you submit the audio data, and then retrieve transcription results asynchronously. The service transcribes the audio data and stores the results in a storage container. You can then retrieve the results from the storage container.

To get started with batch transcription, refer to the following how-to guides:

  1. Locate audio files for batch transcription - You can upload your own data or use existing audio files via public URI or shared access signature (SAS) URI.
  2. Create a batch transcription - Submit the transcription job with parameters such as the audio files, the transcription language, and the transcription model.
  3. Get batch transcription results - Check transcription status and retrieve transcription results asynchronously.

Batch transcription jobs are scheduled on a best-effort basis. You can't estimate when a job will change into the running state, but it should happen within minutes under normal system load. When the job is in the running state, the transcription occurs faster than the audio runtime playback speed.

Next steps