AI-900\ .. \Understand speech recognition and synthesis

Question

AI-900\ .. \Understand speech recognition and synthesis

Marius Borota 0

in the phrase below "to create phonemes" is ambiguous, probably "add acoustic properties" is more accurate

To synthesize speech, the system typically tokenizes the text to break it down into individual words, and assigns phonetic sounds to each word. It then breaks the phonetic transcription into prosodic units (such as phrases, clauses, or sentences) to create phonemes that will be converted to audio format. These phonemes are then synthesized as audio and can be assigned a particular voice, speaking rate, pitch, and volume.

This question is related to the following Learning Module

1 answer

Your answer

Answer 1

Yep - you're correct - “to create phonemes” is ambiguous or slightly misleading in that context, since phonemes are linguistic units representing sounds, not something created by prosodic parsing, but rather identified and given acoustic properties before synthesis.

Here’s a clearer revision of the relevant sentence that better reflects the process:

To synthesize speech, the system typically tokenizes the text into individual words and assigns phonetic sounds to each word. It then groups the phonetic transcription into prosodic units (such as phrases, clauses, or sentences) and adds acoustic properties to the phonemes, which are then converted into audio. These phonemes, enriched with timing, intonation, and stress, are synthesized as audio and can be modulated with a particular voice, speaking rate, pitch, and volume.

This forum is monitored by Microsoft staff - so I'd expect them to take a note of your suggestion and reach out to the team maintaining the MS Learn content regarding its update.

If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

hth

Marcin

VarunTha 14,850 Reputation points Microsoft External Staff Moderator

2025-06-05T16:05:53.64+00:00

Hi Marius Borota,
Thank you for your feedback! We appreciate your insights regarding the phrasing in the content. We will reach out to the content author to discuss your suggestion about using "add acoustic properties" instead of "to create phonemes." Your input is valuable in helping us improve our materials.

If you have any further questions or suggestions, please feel free to share!

Share via

AI-900\ .. \Understand speech recognition and synthesis

1 answer

Your answer