Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
One of the difficulties in designing a voice-enabled agent is having the agent correctly pronounce terminology. Text-to-speech systems might mispronounce some terminology, such as industry-specific terms. As a conversational designer, you know your callers. You can anticipate slang, abbreviations, or alternative names. When captured in a confirmation response, you might want to change them before the system speaks them back to the user.
Lexicons help in these situations. Lexicon files give reading rules to speech synthesis engines to:
- Pronounce words in a specific way using the phonetic alphabet
- Change the text spoken by using an alias
Note
The information in this article requires configuration of a voice-enabled agent and channel in Copilot Studio and Dynamics 365 Contact Center.
Custom lexicon files don't have native integration with Dynamics 365 Contact Center yet. This article gives guidance on how to create your own.
Create the lexicon file
Currently, there aren't any tools to generate custom lexicon files, so you need to author them by hand. Learn more in Custom lexicon examples.
To validate your file, use the SDK provided for validation or Speech Studio.
Use Speech Studio to create your lexicon file. When complete, save your file. Any parsing errors that might arise because of malformed lexicon syntax are noted when you save, which provides a layer of validation. Then download the lexicon file (.xml) to your chosen directory.
Store the lexicon file
Store and reference lexicon files by URI so you can use them in any text-to-speech system. However, use a cloud storage option, because Copilot Studio doesn't natively support these files.
Set up an Azure storage account
You need to create an Azure storage account. Ensure that the the subscription, resource group, region, and resource name of the new storage account according to your organization's policies. We recommend using the following settings:
For Primary Service, select Azure Blob Storage or Azure Data Lake Storage Gen 2.
Select Premium for the Performance.
Learn more in Create an Azure storage account.
Set up the storage container
We recommend using the Static Website as the storage container for your uploaded grammar files. The storage container provides you the primary endpoint and secondary endpoint for the website.
After uploading your grammar file, select the file from directory to view the properties and details of the file. Save the URL for the file, which should be in the following format:
https://{resourceName}.blob.core.windows./net/\$web/{lexiconFileName}
Learn more in Create a container.
Reference a lexicon file from Copilot Studio
Lexicon files should be referenced within Copilot Studio Question nodes. These nodes are where complex words or abbreviations are spoken from dynamic content. This configuration requires the use of phonemes or aliases, respectively, to remediate.
Within a Question node, make sure that Speech + DTMF option is selected. In the text field, add the reference to your custom lexicon file by writing the following SSML: <lexicon uri={“url”}/>, where "url" is the location of the lexicon file in the blob container obtained in the prior steps.
Once complete, test the lexicon file by calling into the voice-enabled agent.
Known limitations
- Menu Options: When the “read options aloud” option is selected as “Read Out,” meaning that menu options will be read out to the caller over voice channels using text-to-speech capabilities, the lexicon won't apply to any text in those menu items.
- Currently, lexicons can only be used when responses are generated by topics that the maker has designed. If generative AI is used to generate a response, the lexicon file isn't used. For example, if generative orchestration is turned on and generative orchestration maps to a topic that contains message and question nodes that have a lexicon file, the lexicon file are used to generate a response.
Note
Custom lexicons apply specifically to the text-to-speech pipeline used in basic voice (Pattern 1) mode. Learn more in Choose how to handle speech.