Configure audio format and voices

Completed

When synthesizing speech, you can use a SpeechConfig object to customize the audio that is returned by the Azure AI Speech service.

Audio format

The Azure AI Speech service supports multiple output formats for the audio stream that is generated by speech synthesis. Depending on your specific needs, you can choose a format based on the required:

  • Audio file type
  • Sample-rate
  • Bit-depth

The supported formats are indicated in the SDK using the SpeechSynthesisOutputFormat enumeration. For example, SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm.

To specify the required output format, use the SetSpeechSynthesisOutputFormat method of the SpeechConfig object:

speechConfig.SetSpeechSynthesisOutputFormat(SpeechSynthesisOutputFormat.Riff24Khz16BitMonoPcm);

For a full list of supported formats and their enumeration values, see the Azure AI Speech SDK documentation.

Voices

The Azure AI Speech service provides multiple voices that you can use to personalize your speech-enabled applications. There are two kinds of voice that you can use:

  • Standard voices - synthetic voices created from audio samples.
  • Neural voices - more natural sounding voices created using deep neural networks.

Voices are identified by names that indicate a locale and a person's name - for example en-GB-George.

To specify a voice for speech synthesis in the SpeechConfig, set its SpeechSynthesisVoiceName property to the voice you want to use:

speechConfig.SpeechSynthesisVoiceName = "en-GB-George";

For information about voices, see the Azure AI Speech SDK documentation.