Thank you for using the Microsoft Q&A forum.
Regarding your query, it seems that you are experiencing a lag between the offset returned with the recognition and the time reported from the sessionStarted event when using the Pronunciation Assessment feature with the recognizeOnceAsync method in Microsoft Azure Cloud.
One possible solution to this issue is to use continuous recognition and pause it when you aren't expecting speech. This can help you avoid the lag between the offset returned with the recognition and the time reported from the sessionStarted event. Additionally, you can use the sessionStopped event to determine when the service is ready to recognize speech again. This event is triggered when the session is stopped, and you can use it to determine when to start the next session.
Another possible solution is to use the configuration parameter Phoneme granularity level in the Pronunciation Assessment feature to get the score on the full text, word, and phoneme level. This should give you word-level results and accuracy. You can also use the NBestPhonemeCount field in the PronunciationAssessmentConfig to get the top phonemes that are most probably spoken by the speaker, ranking by a score that indicates the probability. You can treat the top1 as the actual spoken phoneme.
To indicate whether, and how many potential spoken phonemes to get confidence scores for, set the
NBestPhonemeCount
parameter to an integer value such as5
.
I hope you understand. Thank you.
Please don't forget to click Accept Answer
and Yes
when the provided answer was helpful.