Issue with Pronunciation Assessment in Speech Recognition API Always Returning PronScore 100

Heba Ghazaly 5 Reputation points
2024-08-08T06:22:44.7766667+00:00

I am using the https://eastus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1 API with the POST method for speech-to-text conversion. Here are the details of my implementation:

  • Programming Language: JavaScript
  • Parameters:
    • language
    • format
  • Headers:
    • Pronunciation-Assessment: constructed as follows:
      
          const pronAssessmentParamsJson = `{"ReferenceText":"${currentWordTxt}","GradingSystem":"HundredMark","Dimension":"Comprehensive","Format":"Detailed"}`;
      
          const pronAssessmentParams = Buffer.from(pronAssessmentParamsJson, 'utf-8').toString('base64');
      
      
  • Content Type:
    • 'Content-Type': 'audio/wav'
  • Body:
    • Includes the audio blob.

Problem:

No matter the audio content, the API always returns PronScore: 100.0. For instance, if the ReferenceText is "citizen", the response always gives a PronScore of 100.0, regardless of whether the word "citizen" is pronounced correctly, incorrectly, or not at all.

Example:

  • ReferenceText: "citizen"
  • Audio Blob: A short recording of my pronunciation (or silence)
  • Response: A JSON object where PronScore is 100.0

Troubleshooting Efforts:

  • Tried the API on Postman separately from my JavaScript code, and I still get the same response.
  • No error codes were returned from the API.

Goal:

I aim to check if the audio matches the text for a student activity. If it doesn't match, I use the score from the API to add to the student's score progress.

Could anyone assist in identifying why the PronScore is always 100 and how to resolve this issue?

Thank you.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,070 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,632 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2024-10-17T10:05:55.6766667+00:00

    Hello Mohammed Natour and Heba Ghazaly,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you both are having similar issues where Pronunciation Assessment returning a score of 100.0 in all cases.

    Before you contact support, based on the guidelines and best practices to resolve this issue, try the following check lists:

    1. Incorrect audio formats can lead to unexpected results, make sure the audio blob sent in the request matches the required format specifications for the API and it's (must be) in WAV format (.wav) with proper encoding (e.g., PCM, 16-bit, 16kHz, mono).
    2. Yes, you're using Base64 encoding for the Pronunciation Assessment parameters though, PronAssessmentParams required structured and encoded string in correctly formed and passed in the headers, here is an example:
            const pronAssessmentParamsJson = `{
                "ReferenceText": "${currentWordTxt}",
                "GradingSystem": "HundredMark",
                "Dimension": "Comprehensive",
                "Format": "Detailed"
            }`;
            const pronAssessmentParams = Buffer.from(pronAssessmentParamsJson).toString('base64');
           
      
      The header should look like this snippet too:
         {
                'Pronunciation-Assessment': pronAssessmentParams,
                'Content-Type': 'audio/wav'
         }
      
    3. Low-quality or silent audio can cause the API to default to high confidence or perfect scores, even when it shouldn't. Make sure it's high-quality audio you're using to ensure that the API can clearly detect speech.
    4. The general speech-to-text endpoint may not provide proper pronunciation scoring, so you have to use the dedicated pronunciation assessment endpoint in Azure or include the correct settings in the headers. Check this link for limitations and characteristics: https://learn.microsoft.com/en-us/legal/cognitive-services/speech-service/pronunciation-assessment/characteristics-and-limitations-pronunciation-assessment

    Therefore, if the API still returns inaccurate scores after verifying the above, consider opening a support request via the Azure portal for more investigation.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.

  2. Sina Salam 22,031 Reputation points Volunteer Moderator
    2024-10-17T20:34:10.2666667+00:00

    Hello Mohammed Natour,

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this!

    Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to "Accept " the answer. Accepted answers show up at the top, resulting in improved discoverability for others.

    Issue: Customer Pronunciation Assessment in Speech Recognition API Always Returning PronScore 100

    Error Message: None.

    Solution: Customer created a new network class instance for each request.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.