Pronunciation Assessment SDK returns low score with ErrorType: 'Mispronunciation' compared to Speech Studio.

Question

Pronunciation Assessment SDK returns low score with ErrorType: 'Mispronunciation' compared to Speech Studio.

Shane 0

In my NodeJS app and microsoft-cognitiveservices-speech-sdk package, Azure's Pronunciation Assessment for Japanese (It works fine when I test it in English.) always returns ErrorType: 'Mispronunciation' and a very low score (around 50) even though it usually returns around 90-100 when tested on Speech Studio.

I tried making the audio format to be the same as what's used on Studio.
channels = 1, bitsPerSample = 16, samplesPerSecond = 16000
I also tried downloading the audio file from the studio and tested it with my app which also returns the same result.

{
    Word: 'こんにちは',
    Offset: 5700000,
    Duration: 11700000,
    PronunciationAssessment: { AccuracyScore: 53, ErrorType: 'Mispronunciation' },
    Phonemes: [
      [Object], [Object],
      [Object], [Object],
      [Object], [Object],
      [Object], [Object],
      [Object]
    ]
  }

{   
 accuracyScore: 54,  
 fluencyScore: 100,   
completenessScore: 0,   
pronunciationScore: 30.8 
}

completenessScore is always 0 as well.

Here is my code.

async azureGradeSpeech(
    input: GradeSpeechInput
  ): Promise<SpeechScore> {
    const { text, audioFilePath } = input;
    const speechConfig = SpeechSDK.SpeechConfig.fromSubscription(
      azureSubscriptionKey,
      azureRegion
    );

    const buffer = await this.readLocalFileAsBuffer(audioFilePath);

    const audioConfig = SpeechSDK.AudioConfig.fromWavFileInput(buffer);
    const audAudioConfig =
      SpeechSDK.AutoDetectSourceLanguageConfig.fromLanguages(["ja-JP"]);

    const speechRecognizer = SpeechSDK.SpeechRecognizer.FromConfig(
      speechConfig,
      audAudioConfig,
      audioConfig
    );

    const resultConfig = {
      referenceText: text,
      gradingSystem: "HundredMark",
      granularity: "Phoneme", // Phoneme, Syllable, Word, FullText
      // EnableMiscue: true,
    };

    const pronunciationAssessmentConfig =
      SpeechSDK.PronunciationAssessmentConfig.fromJSON(
        JSON.stringify(resultConfig)
      );

    pronunciationAssessmentConfig.applyTo(speechRecognizer);

    const result: SpeechScore = await new Promise((resolve, reject) => {
      speechRecognizer.recognizeOnceAsync(
        (speechRecognitionResult: SpeechSDK.SpeechRecognitionResult) => {
          // The pronunciation assessment result as a Speech SDK object
          const pronunciationAssessmentResult =
            SpeechSDK.PronunciationAssessmentResult.fromResult(
              speechRecognitionResult
            );

          const pronunciationAssessmentResultJson =
            speechRecognitionResult.properties.getProperty(
              SpeechSDK.PropertyId.SpeechServiceResponse_JsonResult
            );

          const jsonResult = JSON.parse(pronunciationAssessmentResultJson);

          const words = jsonResult.NBest[0].Words;
          const length = words.length;
          const totalAccuracyScore = words.reduce(
            (accumulator, curr) =>
              curr.PronunciationAssessment.AccuracyScore + accumulator,
            0
          );
          const average = Number((totalAccuracyScore / length).toFixed(0));

          const score: SpeechScore = {
            averageAccuracyScore: average,
            accuracyScore: pronunciationAssessmentResult.accuracyScore,
            fluencyScore: pronunciationAssessmentResult.fluencyScore,
            completenessScore: pronunciationAssessmentResult.completenessScore,
            pronunciationScore:
              pronunciationAssessmentResult.pronunciationScore,
          };

          resolve(score);
        },
        (error) => {
          reject(error);
        }
      );
    });

    console.log("result", result);
    return result;
  }

YutongTie-MSFT 53,976 Reputation points Moderator

2023-06-26T21:47:12.82+00:00
Hello, thanks for reaching out to us.

It's possible that the Pronunciation Assessment SDK is returning a lower score with the 'Mispronunciation' error type compared to Speech Studio due to differences in the algorithms used for scoring. However, there are a few things you can try to improve the accuracy of the Pronunciation Assessment SDK:

Ensure that the audio quality is good and that there is no background noise or interference that could affect the accuracy of the assessment.

Try adjusting the PronunciationAssessmentConfig parameters to see if it improves the accuracy of the assessment. For example, you can try changing the granularity from 'Phoneme' to 'Word' or 'FullText'.

Additionally, it's worth noting that the Pronunciation Assessment SDK is still in preview and may not be as accurate as Speech Studio or other established speech recognition systems. You may want to consider using multiple assessment tools and comparing the results to get a more accurate assessment of the pronunciation.

If you could share a sample to us, we can help you look into your case further.

Regards,

Yutong
Shane 0 Reputation points

2023-06-27T02:34:39.8733333+00:00

@YutongTie-MSFT Thank you for the prompt response and suggestions. I have tried improving audio quality and adjusting the granularity but the result is still the same. I'm happy to share my audio samples to further explore. How should I provide them?
Shane 0 Reputation points

2023-06-27T02:37:32.2333333+00:00

@YutongTie-MSFT Here is the link to my demo audios.
https://drive.google.com/drive/folders/1TS31LWZ-GrFFMtmpF6HFdqosmHLNZyXH?usp=sharing
The file name of each audio file is the reference text itself and the language is ja-JP.
If you prefer me to provide the samples in other methods, please do let me know.
Thank you again for your support.
Shane 0 Reputation points

2023-06-28T15:38:54.05+00:00

@YutongTie-MSFT Hi, are there any updates on this issue? Thank you for your time in advance.

Your answer

YutongTie-MSFT 53,976 Reputation points Moderator

2023-06-26T21:47:12.82+00:00

Hello, thanks for reaching out to us.

It's possible that the Pronunciation Assessment SDK is returning a lower score with the 'Mispronunciation' error type compared to Speech Studio due to differences in the algorithms used for scoring. However, there are a few things you can try to improve the accuracy of the Pronunciation Assessment SDK:

Ensure that the audio quality is good and that there is no background noise or interference that could affect the accuracy of the assessment.

Try adjusting the PronunciationAssessmentConfig parameters to see if it improves the accuracy of the assessment. For example, you can try changing the granularity from 'Phoneme' to 'Word' or 'FullText'.

Additionally, it's worth noting that the Pronunciation Assessment SDK is still in preview and may not be as accurate as Speech Studio or other established speech recognition systems. You may want to consider using multiple assessment tools and comparing the results to get a more accurate assessment of the pronunciation.

If you could share a sample to us, we can help you look into your case further.

Regards,

Yutong
Shane 0 Reputation points

2023-06-27T02:34:39.8733333+00:00

@YutongTie-MSFT Thank you for the prompt response and suggestions. I have tried improving audio quality and adjusting the granularity but the result is still the same. I'm happy to share my audio samples to further explore. How should I provide them?
Shane 0 Reputation points

2023-06-27T02:37:32.2333333+00:00

@YutongTie-MSFT Here is the link to my demo audios.
https://drive.google.com/drive/folders/1TS31LWZ-GrFFMtmpF6HFdqosmHLNZyXH?usp=sharing
The file name of each audio file is the reference text itself and the language is ja-JP.
If you prefer me to provide the samples in other methods, please do let me know.
Thank you again for your support.
Shane 0 Reputation points

2023-06-28T15:38:54.05+00:00

@YutongTie-MSFT Hi, are there any updates on this issue? Thank you for your time in advance.

Share via

Pronunciation Assessment SDK returns low score with ErrorType: 'Mispronunciation' compared to Speech Studio.

Your answer