SpeechSynthesizer Class

Definition

Provides access to the functionality of an installed speech synthesis engine (voice) for Text-to-speech (TTS) services.

[Windows.Foundation.Metadata.Activatable(65536, typeof(Windows.Foundation.UniversalApiContract))]
[Windows.Foundation.Metadata.ContractVersion(typeof(Windows.Foundation.UniversalApiContract), 65536)]
[Windows.Foundation.Metadata.MarshalingBehavior(Windows.Foundation.Metadata.MarshalingType.Agile)]
public sealed class SpeechSynthesizer : System.IDisposable
[Windows.Foundation.Metadata.ContractVersion(typeof(Windows.Foundation.UniversalApiContract), 65536)]
[Windows.Foundation.Metadata.MarshalingBehavior(Windows.Foundation.Metadata.MarshalingType.Agile)]
[Windows.Foundation.Metadata.Activatable(65536, "Windows.Foundation.UniversalApiContract")]
public sealed class SpeechSynthesizer : System.IDisposable
Inheritance
Object SpeechSynthesizer
Attributes
Implements

Windows requirements

Device family
Windows 10 (introduced in 10.0.10240.0)
API contract
Windows.Foundation.UniversalApiContract (introduced in v1.0)

Examples

The following example shows how to generate a speech audio stream from a basic text string.

// The media object for controlling and playing audio.
MediaElement mediaElement = this.media;

// The object for controlling the speech synthesis engine (voice).
var synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();

// Generate the audio stream from plain text.
SpeechSynthesisStream stream = await synth.SynthesizeTextToStreamAsync("Hello World");

// Send the stream to the media object.
mediaElement.SetSource(stream, stream.ContentType);
mediaElement.Play();
// The object for controlling the speech synthesis engine (voice).
synth = ref new SpeechSynthesizer();
// The media object for controlling and playing audio.
media = ref new MediaElement();
// The string to speak.
String^ text = "Hello World";

// Generate the audio stream from plain text.
task<SpeechSynthesisStream ^> speakTask = create_task(synth->SynthesizeTextToStreamAsync(text));
speakTask.then([this, text](SpeechSynthesisStream ^speechStream)
{
    // Send the stream to the media object.
    // media === MediaElement XAML object.
    media->SetSource(speechStream, speechStream->ContentType);
    media->AutoPlay = true;
    media->Play();
});

This example shows how to generate a speech audio stream from an SSML string, which includes some modulation elements that control the pitch, speaking rate, and volume of the speech output.

// The string to speak with SSML customizations.
string Ssml =
    @"<speak version='1.0' " +
    "xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>" +
    "Hello <prosody contour='(0%,+80Hz) (10%,+80%) (40%,+80Hz)'>World</prosody> " + 
    "<break time='500ms'/>" +
    "Goodbye <prosody rate='slow' contour='(0%,+20Hz) (10%,+30%) (40%,+10Hz)'>World</prosody>" +
    "</speak>";

// The media object for controlling and playing audio.
MediaElement mediaElement = this.media;

// The object for controlling the speech synthesis engine (voice).
var synth = new Windows.Media.SpeechSynthesis.SpeechSynthesizer();

// Generate the audio stream from plain text.
SpeechSynthesisStream stream = await synth.synthesizeSsmlToStreamAsync(Ssml);

// Send the stream to the media object.
mediaElement.SetSource(stream, stream.ContentType);
mediaElement.Play();
// The object for controlling the speech synthesis engine (voice).
synth = ref new SpeechSynthesizer();
// The media object for controlling and playing audio.
media = ref new MediaElement();
// The string to speak.
String^ ssml =
    "<speak version='1.0' "
    "xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='en-US'>"
    "Hello <prosody contour='(0%,+80Hz) (10%,+80%) (40%,+80Hz)'>World</prosody>"
    "<break time='500ms' /> "
    "Goodbye <prosody rate='slow' contour='(0%,+20Hz) (10%,+30%) (40%,+10Hz)'>World</prosody>"
    "</speak>";

// Generate the audio stream from SSML.
task<SpeechSynthesisStream ^> speakTask = create_task(synth->SynthesizeSsmlToStreamAsync(ssml));
speakTask.then([this, ssml](SpeechSynthesisStream ^speechStream)
{
    // Send the stream to the media object.
    // media === MediaElement XAML object.
    media->SetSource(speechStream, speechStream->ContentType);
    media->AutoPlay = true;
    media->Play();
});

Remarks

Only Microsoft-signed voices installed on the system can be used to generate speech.

Windows includes various Microsoft-signed voices that can be used for a number of languages. Each voice generates synthesized speech in a single language, as spoken in a specific country/region.

By default, a new SpeechSynthesizer object uses the current system voice (call DefaultVoice to find out what the default voice is).

To specify any of the other speech synthesis (text-to-speech) voices installed on the user's system, use the Voice method (to find out which voices are installed on the system, call AllVoices).

If you don't specify a language, the voice that most closely corresponds to the language selected in the Language control panel is loaded.

Use a SpeechSynthesizer object to:

Version history

Windows version SDK version Value added
1703 15063 Options
1709 16299 TrySetDefaultVoiceAsync

Constructors

SpeechSynthesizer()

Initializes a new instance of a SpeechSynthesizer object.

Properties

AllVoices

Gets a collection of all installed speech synthesis engines (voices).

DefaultVoice

Gets the default speech synthesis engine (voice).

Options

Gets a reference to the collection of options that can be set on the SpeechSynthesizer object.

Voice

Gets or sets the speech synthesis engine (voice).

Methods

Close()

Closes the SpeechSynthesizer and releases system resources.

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

SynthesizeSsmlToStreamAsync(String)

Asynchronously generate and control speech output from a Speech Synthesis Markup Language (SSML) Version 1.1 string.

SynthesizeTextToStreamAsync(String)

Asynchronously generate speech output from a string.

TrySetDefaultVoiceAsync(VoiceInformation)

Asynchronously attempts to set the voice used for speech synthesis on an IoT device.

Note

This method is available only in Embedded mode.

Applies to

Product Versions
WinRT Build 10240, Build 10586, Build 14383, Build 15063, Build 16299, Build 17134, Build 17763, Build 18362, Build 19041, Build 20348, Build 22000, Build 22621, Build 26100

See also