Training
Module
Create speech-enabled apps with Azure AI services - Training
Create speech-enabled apps with Azure AI services.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
APPLIES TO: Composer v1.x and v2.x
Bots are able to communicate over speech based channels, such as Direct Line Speech (enabling speech experiences in Web Chat), or via embedded devices.
Bots can use text-to-speech (also known as speech synthesis, and referred to as speech in this article) to convert text to human-like synthesized speech. Text is converted to a phonetic representation (the individual components of speech sounds) which are converted to waveforms that are output as speech. Composer uses Speech Synthesis Markup Language (SSML), an XML-based markup language that lets developers specify how input text is converted into synthesized speech. SSML gives developers the ability to customize different aspects of speech, like pitch, pronunciation, rate of speech, and more.
This modality lets developers create bots in Composer that can not only respond visually with text, but also audibly with speech. Bot developers and designers can create bots with various voices in different languages with the speech middleware and using SSML tags.
It's important to ensure that bot responses are optimized for the channels that they'll be available on. For example, a welcome message written in text along with an Adaptive Card attachment won't be suitable when sent via a speech capable channel. For this reason, bot responses can contain both text and speech responses, with the speech response used by the channel when required.
Using the response editor, bot developers can easily add speech components to bots and customize them with SSML tags.
To add speech to a bot, complete the following steps:
Open a bot project and add a Send a response action to one of your dialogs. Enter text in the Text box for a fallback text response.
Now click the + next to Text. You'll see three options: Speech, Attachments, and Suggested Actions. Select Speech.
When speech is added, you'll see Input hint: accepting next to Response variations. Select Input hint: accepting to see all of the available input hints:
For more information, see the Bot Framework SDK article Add input hints to messages with the Bot Connector API.
You can add SSML tags to your speech component to customize your speech output. Select SSML tag in the command bar to see the SSML tag options.
Composer supports the following SSML tags:
breaks
) between words, or prevent pauses automatically added by the text-to-speech service.For more information, see the Improve synthesis with Speech Synthesis Markup Language (SSML) article.
In order for speech responses to work correctly on some channels, including Direct Line Speech channels, there are some required SSML tags that must be present.
Tip
Visit the language and voice support for the Speech Service documentation to see a list of supported voice fonts. It's recommended that you use neural voice fonts, where available, as these sound particularly human-like.
Composer makes it as easy as possible for bot builders to develop speech applications, automatically including these SSML tags on all outgoing responses, with the ability to modify related properties in the Composer runtime settings.
To access the speech related settings, complete the following steps:
Open a Composer bot project and select Configure in the navigation pane
Select Advanced Settings View (json) to show the JSON view of the project settings. There are two relevant speech sections, shown below.
"voiceFontName": "en-US-AriaNeural"
: Determines the voiceFontName
your bot will use to speak, and the default is en-US-AriaNeural
. You can customize this using any of the available voices and locales appropriate for your bot."fallbackToTextForSpeechIfEmpty": true
: Determines whether text will be used if speech is empty, and the default
is true. If you don't add SSML tags to your speech, there will be silence, and instead the text will be displayed as a fallback message. To turn this off, set this to false
.Note
If you need to disable the speak or voice SSML tags being applied to all responses, you can do this by removing the setSpeak
element from your bot settings completely. This will disable the related middleware within the runtime.
Speech is supported by the Direct Line Speech channel within Azure Bot Service. For information about connecting a bot to channels that support voice, see Connect a bot to Direct Line Speech.
To test speech capabilities in your bot, connect your bot to a speech enabled channel, such as Direct Line Speech, and use the channel's native communication method to test your bot, such as Web Chat.
To inspect the responses being sent by your bot, including speech specific responses containing the automatically added SSML tags, plus any that you've manually added, do the following.
Training
Module
Create speech-enabled apps with Azure AI services - Training
Create speech-enabled apps with Azure AI services.