Pronunciation assessment in Speech Studio
Pronunciation assessment uses the Speech-to-Text capability to provide subjective and objective feedback for language learners. Practicing pronunciation and getting timely feedback are essential for improving language skills. Assessments driven by experienced teachers can take a lot of time and effort and makes a high-quality assessment expensive for learners. Pronunciation assessment can help make the language assessment more engaging and accessible to learners of all backgrounds.
Pronunciation assessment provides various assessment results in different granularities, from individual phonemes to the entire text input.
- At the full-text level, pronunciation assessment offers additional Fluency and Completeness scores: Fluency indicates how closely the speech matches a native speaker's use of silent breaks between words, and Completeness indicates how many words are pronounced in the speech to the reference text input. An overall score aggregated from Accuracy, Fluency and Completeness is then given to indicate the overall pronunciation quality of the given speech.
- At the word-level, pronunciation assessment can automatically detect miscues and provide accuracy score simultaneously, which provides more detailed information on omission, repetition, insertions, and mispronunciation in the given speech.
- Syllable-level accuracy scores are currently only available via the JSON file or Speech SDK.
- At the phoneme level, pronunciation assessment provides accuracy scores of each phoneme, helping learners to better understand the pronunciation details of their speech.
This article describes how to use the pronunciation assessment tool through the Speech Studio. You can get immediate feedback on the accuracy and fluency of your speech without writing any code. For information about how to integrate pronunciation assessment in your speech applications, see How to use pronunciation assessment.
Usage of pronunciation assessment is charged the same as standard Speech to Text pricing.
Try out pronunciation assessment
You can explore and try out pronunciation assessment even without signing in.
Follow these steps to assess your pronunciation of the reference text:
Go to Pronunciation Assessment in the Speech Studio.
Choose a supported language that you want to evaluate the pronunciation.
Choose from the provisioned text samples, or under the Enter your own script label, enter your own reference text.
When reading the text, you should be close to microphone to make sure the recorded voice isn't too low.
Otherwise you can upload recorded audio for pronunciation assessment. Once successfully uploaded, the audio will be automatically evaluated by the system, as shown in the following screenshot.
Pronunciation assessment results
Once you've recorded the reference text or uploaded the recorded audio, the Assessment result will be output. The result includes your spoken audio and the feedback on the accuracy and fluency of spoken audio, by comparing a machine generated transcript of the input audio with the reference text. You can listen to your spoken audio, and download it if necessary.
You can also check the pronunciation assessment result in JSON. The word-level, syllable-level, and phoneme-level accuracy scores are included in the JSON file.
Pronunciation Assessment evaluates three aspects of pronunciation: accuracy, fluency, and completeness. At the bottom of Assessment result, you can see Pronunciation score, Accuracy score, Fluency score, and Completeness score. The Accuracy score and the Fluency score will vary over time throughout the recording process. The Completeness score is only calculated at the end of the evaluation. The Pronunciation score is overall score indicating the pronunciation quality of the given speech. During recording, the Pronunciation score is aggregated from Accuracy score and Fluency score with weight. Once completing recording, this overall score is aggregated from Accuracy score, Fluency score, and Completeness score with weight.
Scores within words
The complete transcription is shown in the Display window. If a word is omitted, inserted, or mispronounced compared to the reference text, the word will be highlighted according to the error type. While hovering over each word, you can see accuracy scores for the whole word or specific phonemes.