Asking about Speech to text Pronunciation Assessment(About Phoneme Recognition)

Minseong 6 Reputation points
2021-07-30T04:30:45.08+00:00

Hello,

I'm looking for feature that gives me exact result of recognized phoneme.
I was able to find that azure speech to text pronunciation assessment supports score for each phoneme of reference text.
But I'm wondering if there is a way to get exact recognized phoneme(In case of low score feedback from phoneme assessment).

Thank you for your help, in advance.
Have a nice day :D

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,446 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Yinhe Wei 1 Reputation point Microsoft Employee
    2021-09-07T06:08:24.493+00:00

    Hi, @Minseong

    We have a preview feature which can probably handle your ask.
    You can add one additional field "NBestPhonemeCount" to the json config as below:

    var pronAssessmentConfig = PronunciationAssessmentConfig.FromJson($"{<!-- -->{\"referenceText\":\"<reference text>\",\"gradingSystem\":\"HundredMark\",\"granularity\":\"Phoneme\",\"dimension\":\"Comprehensive\",\"enableMiscue\":\"False\",\"NBestPhonemeCount\":5}}");

    This additional field can trigger the "NBestPhonemes" section in the output json payload, meaning the top phonemes which are most probably spoken by the speaker, ranking by a score which indicates the probability.
    You the treat the top1 as the actual spoken phoneme.
    See below for example:

         "Words": [  
            {  
               "Word" : "Good",  
               "Offset" : 500000,  
               "Duration" : 2700000,  
               "PronunciationAssessment": {  
                  "AccuracyScore" : 100.0,  
                  "ErrorType" : "None"  
               },  
               "Syllables" : [  
                  {  
                     "Syllable" : "ɡuhd",  
                     "Offset" : 500000,  
                     "Duration" : 2700000,  
                     "PronunciationAssessment" : {  
                        "AccuracyScore": 100.0  
                     }  
                  }  
               ],  
               "Phonemes": [  
                  {  
                     "Phoneme" : "ɡ",  
                     "Offset" : 500000,  
                     "Duration": 1200000,  
                     "PronunciationAssessment": {  
                        "AccuracyScore": 100.0,  
                        "NBestPhonemes": [  
                           {  
                               "Phoneme": "g",  
                               "Score": 100.0  
                           },  
                           {  
                               "Phoneme": "k",  
                               "Score": 5.0  
                           },  
                           ... // remaining n best phonemes  
                        ]  
                     }  
                  },  
    

    Thanks,
    Yinhe

    0 comments No comments