Note
Please see Azure Cognitive Services for Speech documentation for the latest supported speech solutions.
Microsoft.Speech.Synthesis Namespace
The Microsoft.Speech.Synthesis namespace contains classes for initializing and configuring a speech synthesis engine, for creating prompts, for generating speech, for responding to events, and for modifying voice characteristics. Speech synthesis is often referred to as text-to-speech or TTS.
Initialize and Configure
The SpeechSynthesizer class provides access to the functionality of a speech synthesis engine in the Microsoft Speech Platform Runtime 11. A TTS engine can use any of the installed Runtime Languages as a voice to perform text-to-speech in a particular language, for example Microsoft Helen for US English.
A voice is an installed Runtime Language for speech synthesis (TTS, or text-to-speech). The Speech Platform Runtime 11 and Microsoft Speech Platform SDK 11 do not include any Runtime Languages for speech synthesis. You must download and install a Runtime Language for each language in which you want to generate synthesized speech. A Runtime Language includes the language model, acoustic model, and other data necessary to provision a speech engine to perform speech synthesis in a particular language. See InstalledVoice for more information.
To configure a SpeechSynthesizer instance to use one of the installed voices, call the SelectVoice(String) or SelectVoiceByHints() methods. To get information about which voices are installed, use the GetInstalledVoices() method.
You can route the output of the SpeechSynthesizer to a stream, a file, the default audio device, or to a null device by using one of the methods in the SpeechSynthesizer class whose name begins with “SetOutputTo”.
Create Prompts
Use one the methods of the PromptBuilder class whose name begins with “Append” to build content for prompts from text, Speech Synthesis Markup Language (SSML), files containing text or SSML markup, or prerecorded audio files.
See Construct a Complex Prompt (Microsoft.Speech)in the Microsoft Speech Programming Guide for more information and examples.
Generate Speech
To generate speech from a string or from a Prompt or PromptBuilder object, use the Speak() or the SpeakAsync() methods. To generate speech from SSML markup, use the SpeakSsml(String) or the SpeakSsmlAsync(String) methods. See Speech Synthesis Markup Language Reference (Microsoft.Speech)for a guide to SSML markup.
You can guide the pronunciation of words by using the AppendTextWithHint() or AppendTextWithPronunciation(String, String) methods, and by adding or removing lexicons for a SpeechSynthesizer instance using the AddLexicon(Uri, String) and RemoveLexicon(Uri) methods.
Respond to Events
The SpeechSynthesizer class includes events that inform a speech application that the SpeechSynthesizer encountered a specific feature in a prompt, as reported by the SpeakProgressEventArgs, BookmarkReachedEventArgs, and VoiceChangeEventArgs classes.
To get information about the beginning and end of the speaking of a prompt by the SpeechSynthesizer, use the SpeakStartedEventArgs and SpeakCompletedEventArgs classes.
See Use Speech Synthesis Events (Microsoft.Speech) in the Microsoft Speech Programming Guide for more information and examples.
Modify Voice Characteristics
The PromptStyle class and StartStyle(PromptStyle) and AppendText() methods let you modify characteristics of a SpeechSynthesizer voice using Emphasis, Rate, and Volume parameters. To modify characteristics of a voice such as culture, age, and gender, use one of the StartVoice() methods of the PromptBuilder class or the SelectVoiceByHints() methods of the SpeechSynthesizer class.
See Control Voice Attributes (Microsoft.Speech) in the Microsoft Speech Programming Guide for more information and examples.
Classes
Class | Description | |
---|---|---|
BookmarkReachedEventArgs | Returns data from the BookmarkReached event. | |
FilePrompt | Represents a prompt created from a file. | |
InstalledVoice | Contains information about an installed speech synthesis voice. | |
Prompt | Represents information about what can be rendered, either text or an audio file, by the SpeechSynthesizer. | |
PromptBuilder | Creates an empty Prompt object and provides methods for adding content, selecting voices, controlling voice attributes, and controlling the pronunciation of spoken words. | |
PromptEventArgs | Represents the base class for EventArgs classes in the Microsoft.Speech.Synthesis namespace. | |
PromptStyle | Defines a style for speaking prompts that consists of settings for emphasis, rate, and volume. | |
ProprietaryEngineEventArgs | Returns data from an event raised by a proprietary speech synthesis engine. | |
SpeakCompletedEventArgs | Returns notification from the SpeakCompleted event. | |
SpeakProgressEventArgs | Returns data from the SpeakProgress event. | |
SpeakStartedEventArgs | Returns notification from the SpeakStarted event. | |
SpeechSynthesizer | Provides access to the functionality of an installed a speech synthesis engine. | |
StateChangedEventArgs | Returns data from the StateChanged event. | |
VoiceChangeEventArgs | Returns data from the VoiceChange event. | |
VoiceInfo | Represents an installed Runtime Language for speech synthesis. |
Enumerations
Enumeration | Description | |
---|---|---|
PromptBreak | Enumerates values for intervals of prosodic separation (breaks) between word boundaries. | |
PromptEmphasis | Enumerates values for the levels of speaking emphasis in prompts. | |
PromptRate | Enumerates values for the speaking rate of prompts. | |
PromptVolume | Enumerates values for volume levels (loudness) in prompts. | |
SayAs | Enumerates the content types for the speaking of elements such as times, dates, and currency. | |
SynthesisMediaType | Enumerates the types of media files. | |
SynthesisTextFormat | Enumerates the types of text formats that may be used to construct a Prompt object. | |
SynthesizerState | Enumerates values for the state of the SpeechSynthesizer. | |
VoiceAge | Defines the values for the age of a synthesized voice. | |
VoiceGender | Defines the values for the gender of synthesized voices. |