Run batch operations with the Speech CLI
Common tasks when using Azure Speech services, are batch operations. In this article, you'll learn how to do batch speech to text (speech recognition), batch text to speech (speech synthesis) with the Speech CLI. Specifically, you'll learn how to:
- Run batch speech recognition on a directory of audio files
- Run batch speech synthesis by iterating over a
Batch speech to text (speech recognition)
The Speech service is often used to recognize speech from audio files. In this example, you'll learn how to iterate over a directory using the Speech CLI to capture the recognition output for each
.wav file. The
--files flag is used to point at the directory where audio files are stored, and the wildcard
*.wav is used to tell the Speech CLI to run recognition on every file with the extension
.wav. The output for each recognition file is written as a tab separated value in
--threads argument can be also used in the next section for
spx synthesize commands, and the available threads will depend on the CPU and its current load percentage.
spx recognize --files C:\your_wav_file_dir\*.wav --output file C:\output_dir\speech_output.tsv --threads 10
The following is an example of the output file structure.
audio.input.id recognizer.session.started.sessionid recognizer.recognized.result.text sample_1 07baa2f8d9fd4fbcb9faea451ce05475 A sample wave file. sample_2 8f9b378f6d0b42f99522f1173492f013 Sample text synthesized.
Batch text to speech (speech synthesis)
The easiest way to run batch text-to-speech is to create a new
.tsv (tab-separated-value) file, and use the
--foreach command in the Speech CLI. You can create a
.tsv file using your favorite text editor, for this example, let's call it
When copying the contents of this text file, make sure that your file has a tab not spaces between the file location and the text. Sometimes, when copying the contents from this example, tabs are converted to spaces causing the
spx command to fail when run.
audio.output text C:\batch_wav_output\wav_1.wav Sample text to synthesize. C:\batch_wav_output\wav_2.wav Using the Speech CLI to run batch-synthesis. C:\batch_wav_output\wav_3.wav Some more text to test capabilities.
Next, you run a command to point to
text_synthesis.tsv, perform synthesis on each
text field, and write the result to the corresponding
audio.output path as a
spx synthesize --foreach in @C:\your\path\to\text_synthesis.tsv
This command is the equivalent of running
spx synthesize --text "Sample text to synthesize" --audio output C:\batch_wav_output\wav_1.wav for each record in the
A couple things to note:
- The column headers,
text, correspond to the command-line arguments
--text, respectively. Multi-part command-line arguments like
--audio outputshould be formatted in the file with no spaces, no leading dashes, and periods separating strings, for example,
audio.output. Any other existing command-line arguments can be added to the file as additional columns using this pattern.
- When the file is formatted in this way, no additional arguments are required to be passed to
- Ensure to separate each value in the
.tsvwith a tab.
However, if you have a
.tsv file like the following example, with column headers that do not match command-line arguments:
wav_path str_text C:\batch_wav_output\wav_1.wav Sample text to synthesize. C:\batch_wav_output\wav_2.wav Using the Speech CLI to run batch-synthesis. C:\batch_wav_output\wav_3.wav Some more text to test capabilities.
You can override these field names to the correct arguments using the following syntax in the
--foreach call. This is the same call as above.
spx synthesize --foreach audio.output;text in @C:\your\path\to\text_synthesis.tsv