Cognitive services pronunciation assessment always gives 100% score, even with badly pronounced words

Schoolblocks 0 Reputation points
2024-05-17T10:49:24.3766667+00:00

I built a svelte (javascript) application that uses the microsoft speech sdk (v1.36), and i am using it to evaluate pronunciation in 3 languages: english, german and french.

Initially i was using RecognizeOnceAsync() which waits for silence at the end of the user's speech to then evaluate the pronunciation, but since we use it in a classroom, i switched to startContinuousRecognitionAsync() which allows the user to start and stop the speech, making it better for crowded rooms.

The problem is that the pronunciation assessment is almost always 100% in all parameters (accuracy, fluency, completeness and prosody - when available (currently only english has it)). i notice that for shorter phrases, like "good morning" or "guten morgen", the score is always 100, no matter how weirdly i speak or make my pronunciation wrong. If the phrase is slightly longer, i see better results, with some words marked as 80 accuracy, etc.

This is causing the product to be unusable. We are evaluating children's pronunciation, and always getting a 100 score defeats the purpose.

The results from the first one shot method were much better than with continuous recognition, i.e. it correctly assessed my pronunciation on all languages, giving adequate scores to my speech.

Some information that might be useful for your reply:

  • i cloned the sdk repo (https://github.com/microsoft/cognitive-services-speech-sdk-js) and built it locally to use the sdk minimized build - using latest;
  • i am including the sdk using <script> tag
  • my project uses svelte framework (v3.48);
  • not using any backend - its the frontend that calls microsoft directly using the sdk
  • the scores i mention being 100 are straight from raw microsoft response, so no treatment on my end
  • i have the service settings at: gradingSystem: HundredMark, granularity: Phoneme and Dimension: Comprehensive, prosody enabled, miscue enabled.
  • tried configuring the phoneme alphabet, both for SAPI and IPA, results are the same
  • investigated the resulting JSON from the response and both word scores and phoneme scores are all 100, no matter how badly i pronounce the text

We are a paying customer and this is an essential feature in our language learning product. What can i do about this? I see this as a huge fault in your service, i.e. a pronunciation evaluation that always gives perfect scores no matter how the pronunciation is done is basically useless.

Any hints, suggestions is very welcome.

Please ask if you need any extra information.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,464 questions
{count} votes