Prompt Engine Markup Language
This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.
The prompt engine is the Speech Server component that takes text input and produces speech output by concatenating recordings of words and phrases that match the text input. The prompt engine stores the recordings it uses on disk and indexes them in one or more prompt database files.
Behavior of the Prompt Engine
Requests for taking text input and producing speech output are created by functions in the speech application code. A prompt engine request normally includes Prompt Engine Markup Language (PEML) markup to specify the databases that the prompt engine should use and the text input for which the prompt engine should produce speech output.
Prompt engine requests from a Speech Server application to the prompt engine are mediated by the Speech API for Microsoft .NET Framework 3.0. The Speech API parses requests and stores the elements of each request along with other information in a data structure that is then passed to the prompt engine. When the text contained in the elements of a request are sent to the prompt engine, the prompt engine searches the specified databases for prerecorded segments that match the text. Segments that do not have an appropriate match in the database are synthesized using text-to-speech (TTS).
For the purpose of database searches, the prompt engine concatenates the text content of adjacent PEML elements unless certain differences exist between the adjacent elements. The prompt engine searches for concatenated strings in the active databases and plays back the recordings for strings that match. Differences that preclude the concatenation of the text content in one element with the text content of the subsequent element include the following:
- Differences in prosodic values (rate, pitch, and volume) between the elements.
- Differences in ssml:sayas values between the elements.
- Differences in emphasis or duration values between the elements.
- Differences in the target language of the elements.
- The inclusion of a break in one element but not the other adjacent element.
If any of the previously listed differences exist between one element and the element that immediately follows it, the prompt engine stops the concatenation process, performs a database search for the string that the prompt engine has concatenated up to that point, plays back the recording for the matching string, and then begins the concatenation, search, and playback process again starting with the immediately following element. If the prompt engine finds only a partial match to a search string, it plays the recording for the partial match and uses a TTS engine to synthesize speech for the remaining unmatched portion of the search string. If the prompt engine cannot find a recording for a segment in the active databases, it uses a TTS engine to synthesize speech for that segment. During database searches, the prompt engine ignores elements that are neither PEML or SSML elements.
Prompt Engine XML Elements
All PEML elements belong to the peml namespace and must include the peml namespace prefix. The namespace for peml is specified as follows.
xmlns:peml="http://schemas.microsoft.com/Speech/2003/03/PromptEngine"
Speech Server??implements the following PEML elements.
PEML Element | Description |
---|---|
peml:prompt_output |
The required root element of a prompt engine XML document. |
peml:database |
Specifies a database for the prompt engine to use. |
peml:div |
Divides the input text into the segments that the prompt engine searches for in the active databases. |
peml:id |
Identifies a specific prompt in an active database. |
peml:tts |
Causes the prompt engine to send the contained text directly to the TTS engine, bypassing a database search. |
peml:withtag |
Identifies which database entry the prompt engine should use if more than one possible selection for the searched text exists. |
peml:rule |
Specifies a text modification rule to apply to the contained text. |
Nesting Prompt Engine XML Elements
The prompt engine observes the following nesting rules for the peml:tts, peml:rule, and peml:withtag elements:
- Any XML element can be nested within a peml:tts element, except for another peml:tts element. The prompt engine passes all XML that is nested within a peml:tts element to the TTS engine. XML is not processed by the prompt engine.
- Any XML element can be nested within a peml:rule element, except for another peml:rule element. The script rule specified by the peml:rule element must be able to parse the contained XML. If the script rule is unable to parse the contained XML, the prompt engine applies its default processing behavior to the nested XML.
- Any XML element can be nested within the peml:withtag element, including other peml:withtag elements.
Note
The prompt engine renders invalid PEML using TTS synthesis, disregarding any indication in the PEML markup to use recorded segments from a database.
Prompt Engine Database Searches
When the Speech Server prompt engine searches databases, the following processing characteristics are implemented:
- The prompt engine normalizes the text for white space, case, and punctuation in the same manner as that used when loading the databases.
- The maximum length of a search string is 1,000 characters, including spaces, control characters, and line breaks (line breaks are treated as a single character).
Note
To cause the prompt engine to speak a longer string, use the peml:id element to specify prompts that compose the longer string.
- Text content of adjacent PEML elements that do not differ from one another in prosody, emphasis, duration, target language, or ssml:sayas values or do not contain breaks are combined into a single search string.
- If a recording for a segment exists in the active databases, the prompt engine plays the recording.
- If only a partial match for a segment is found in the active databases, the prompt engine plays the matching portion of the segment and creates a fallback TTS engine to synthesize the remaining portion of the segment.
- If a recording for a given segment does not exist at all in the databases that are loaded into memory or the audio file associated with the segment is not a valid .wav file, the prompt engine creates a fallback TTS engine that synthesizes the output.
- All .wav files containing silence can be used to produce silent output using the prompt engine.