Text-to-speech

Article
06/22/2023

The XSpeechSynthesizer API provides support for initializing and configuring a speech synthesis engine (or voice) to convert a text string to an audio stream, also known as text-to-speech (TTS). For example, voice characteristics, pronunciation, volume, pitch, rate or speed, and emphasis are customized through Speech Synthesis Markup Language (SSML) Version 1.0.

Note

This API requires callers to user version 1.0 of SSML.

The following steps show how to use the API.

Create the speech synthesizer by calling XSpeechSynthesizerCreate. Make sure to hold on to the handle.
You can also choose the voice you want by calling either XSpeechSynthesizerSetCustomVoice or XSpeechSynthesizerSetDefaultVoice.
For each bit of text you want to convert from text to speech, use the following steps.
1. Create a new stream by calling XSpeechSynthesizerCreateStreamFromText. By the time that this function completes, the conversion of the text to a .wav file is complete. This conversion can take some time and shouldn't done on any time-critical threads.
2. Determine the size of the buffer you need to get the audio data from by calling XSpeechSynthesizerGetStreamDataSize.
3. Get the audio data (.wav file) from that stream by calling XSpeechSynthesizerGetStreamData.
4. Pass the audio data to an audio renderer.
5. Close the stream handle by calling XSpeechSynthesizerCloseStreamHandle.
When you're completely done with speech synthesis, close the handle by calling SpeechSynthesizerCloseHandle.

Share via

Text-to-speech

See also

Additional resources