Quickstart: Get started with the Azure Speech CLI

In this article, you'll learn how to use the Azure Speech CLI (also called SPX) to access Speech services such as speech-to-text, text-to-speech, and speech translation, without having to write any code. The Speech CLI is production ready, and you can use it to automate simple workflows in the Speech service by using .bat or shell scripts.

This article assumes that you have working knowledge of the Command Prompt window, terminal, or PowerShell.

Note

In PowerShell, the stop-parsing token (--%) should follow spx. For example, run spx --% config @region to view the current region config value.

Download and install

Follow these steps to install the Speech CLI on Windows:

  1. Install the Microsoft Visual C++ Redistributable for Visual Studio 2019 for your platform. Installing it for the first time might require a restart.

  2. Install .NET 6.

  3. Install the Speech CLI via the .NET CLI by entering this command:

    dotnet tool install --global Microsoft.CognitiveServices.Speech.CLI
    

    To update the Speech CLI, enter this command:

    dotnet tool update --global Microsoft.CognitiveServices.Speech.CLI
    

Enter spx or spx help to see help for the Speech CLI.

Font limitations

On Windows, the Speech CLI can show only fonts that are available to the command prompt on the local computer. Windows Terminal supports all fonts that the Speech CLI produces interactively.

If you output to a file, a text editor like Notepad or a web browser like Microsoft Edge can also show all fonts.

Create a resource configuration

To get started, you need a Speech resource key and region identifier (for example, eastus, westus). Create a Speech resource on the Azure portal. For more information, see Create a new Azure Cognitive Services resource.

To configure your resource key and region identifier, run the following commands:

spx config @key --set SPEECH-KEY
spx config @region --set SPEECH-REGION

The key and region are stored for future Speech CLI commands. To view the current configuration, run the following commands:

spx config @key
spx config @region

As needed, include the clear option to remove either stored value:

spx config @key --clear
spx config @region --clear

Basic usage

This section shows a few basic SPX commands that are often useful for first-time testing and experimentation. Start by viewing the help that's built into the tool by running the following command:

spx

You can search help topics by keyword. For example, to see a list of Speech CLI usage examples, run the following command:

spx help find --topics "examples"

To see options for the recognize command, run the following command:

spx help recognize

Additional help commands are listed in the console output. You can enter these commands to get detailed help about subcommands.

Speech-to-text (speech recognition)

To convert speech to text (speech recognition) by using your system's default microphone, run the following command:

spx recognize --microphone

After you run the command, SPX begins listening for audio on the current active input device. It stops listening when you select Enter. The spoken audio is then recognized and converted to text in the console output.

With the Speech CLI, you can also recognize speech from an audio file. Run the following command:

spx recognize --file /path/to/file.wav

Note

If you're using a Docker container, --microphone will not work.

If you're recognizing speech from an audio file in a Docker container, make sure that the audio file is located in the directory that you mounted previously.

Tip

If you get stuck or want to learn more about the Speech CLI recognition options, you can run spx help recognize.

Text-to-speech (speech synthesis)

The following command takes text as input and then outputs the synthesized speech to the current active output device (for example, your computer speakers).

spx synthesize --text "Testing synthesis using the Speech CLI" --speakers

You can also save the synthesized output to a file. In this example, let's create a file named my-sample.wav in the directory where you're running the command.

spx synthesize --text "Enjoy using the Speech CLI." --audio output my-sample.wav

These examples presume that you're testing in English. However, Speech service supports speech synthesis in many languages. You can pull down a full list of voices either by running the following command or by visiting the language support page.

spx synthesize --voices

Here's a command for using one of the voices you've discovered.

spx synthesize --text "Bienvenue chez moi." --voice fr-FR-AlainNeural --speakers

Tip

If you get stuck or want to learn more about the Speech CLI recognition options, you can run spx help synthesize.

Speech-to-text translation

With the Speech CLI, you can also do speech-to-text translation. Run the following command to capture audio from your default microphone and output the translation as text. Keep in mind that you need to supply the source and target language with the translate command.

spx translate --microphone --source en-US --target ru-RU

When you're translating into multiple languages, separate the language codes with a semicolon (;).

spx translate --microphone --source en-US --target ru-RU;fr-FR;es-ES

If you want to save the output of your translation, use the --output flag. In this example, you'll also read from a file.

spx translate --file /some/file/path/input.wav --source en-US --target ru-RU --output file /some/file/path/russian_translation.txt

Note

For a list of all supported languages and their corresponding locale codes, see Language and voice support for the Speech service.

Tip

If you get stuck or want to learn more about the Speech CLI recognition options, you can run spx help translate.

Next steps