How to get phonemes from azure speech sdk

Question

Hi, I am following the Microsoft Azure Speech-to-Text Python sdk tutorial here. I would like to know if there is a way to return the phonemes, an intermediate step in generating the interpreted text. Is that possible? If so, can you please refer me to the documentation and hopefully some sample code that does this. I searched and could not find anything that already answered my question.

Thanks!
Doug

Answer

Hi @Doug Bergman

Is this not what you'r looking for? Sample response for the Phonemes of the word "Thank":

{
"Duration": 4700000,
"Offset": 11500000,
"Phonemes": [
{
"Duration": 2100000,
"Offset": 11500000,
"Phoneme": "th",
"PronunciationAssessment": {
"AccuracyScore": 100.0
}
},
{
"Duration": 900000,
"Offset": 13700000,
"Phoneme": "ae",
"PronunciationAssessment": {
"AccuracyScore": 100.0
}
},
{
"Duration": 700000,
"Offset": 14700000,
"Phoneme": "ng",
"PronunciationAssessment": {
"AccuracyScore": 100.0
}
},
{
"Duration": 700000,
"Offset": 15500000,
"Phoneme": "k",
"PronunciationAssessment": {
"AccuracyScore": 100.0
}
}
],
"PronunciationAssessment": {
"AccuracyScore": 100.0,
"ErrorType": "None"
},
"Word": "Thank"
}

Answer

Hello Doug,

This is

Pronunciation assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. Educators can use the capability to evaluate pronunciation of multiple speakers in real-time.

But the pronunciation assessment feature is currently only available in regions westus, eastasia and centralindia, and only supports language en-US.

Please refer to following sample code for how to set up and retrieve.

https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp#L633

Regards,
Yutong

Share via

How to get phonemes from azure speech sdk

2 answers