Unable to Get Logical Results with Azure Pronunciation Assessment

tzviya langenthal 0 Reputation points
2024-11-07T20:28:30.9833333+00:00

I'm trying to use the pronunciationAssessment feature in the Azure Speech SDK, but I cannot get reasonable result.
I've tested this with the word "school" and other words as well, but I always get a result of 0—no matter whether the word was correctly spoken or not. I generated the audio files using this Text-to-Audio tool, so this should be easily reproducible.

Does anyone have any idea why the accuracy score is always 0, or what I might be missing?

namespace PronunciationAssessmentDemo
{
    class Program
    {
        public static AudioConfig CreateAudioConfigFromBytes(byte[] audioBytes)
        {
            var audioStream = new MemoryStream(audioBytes);
            var pushStream = AudioInputStream.CreatePushStream();
            pushStream.Write(audioBytes);
            pushStream.Close();
            var audioConfig = AudioConfig.FromStreamInput(pushStream);
            return audioConfig;
        }
        public static async Task<float> AssessPronunciation(byte[] audioBytes, string referenceText)
        {
            string subscriptionKey = Environment.GetEnvironmentVariable("STT_API_KEY");
            string region = "eastus";
            var pronunciationAssessmentConfig = new PronunciationAssessmentConfig(
                referenceText: referenceText,
                gradingSystem: GradingSystem.HundredMark,
                granularity: Granularity.Phoneme,
                enableMiscue: false);
            var audioConfig = CreateAudioConfigFromBytes(audioBytes);
            var speechConfig = SpeechConfig.FromSubscription(subscriptionKey, region);
            using (var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig))
            {
                pronunciationAssessmentConfig.ApplyTo(speechRecognizer);
                var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
                if (speechRecognitionResult.Reason == ResultReason.RecognizedSpeech)
                {
                    Console.WriteLine("Recognized: " + speechRecognitionResult.Text);
                    var pronunciationAssessmentResult = PronunciationAssessmentResult.FromResult(speechRecognitionResult);
                    Console.WriteLine($"Accuracy Score: {pronunciationAssessmentResult.AccuracyScore}");
                    return (float)pronunciationAssessmentResult.AccuracyScore;
                }
                else
                {
                    Console.WriteLine($"Recognition failed: {speechRecognitionResult.Reason}");
                    return 0;
                }
            }
        }
        static async Task Main(string[] args)
        {
            string audioFilePath = "...wwwroot\\audio\\school.wav"; 
            string referenceText = "school"; 
            byte[] audioBytes = File.ReadAllBytes(audioFilePath);
            float accuracyScore = await AssessPronunciation(audioBytes, referenceText);
            Console.WriteLine($"Final Accuracy Score: {accuracyScore}");
        }
    }
}


using Microsoft.CognitiveServices.Speech.Audio;

namespace PronunciationAssessmentDemo
{
    class Program
    {
        public static AudioConfig CreateAudioConfigFromBytes(byte[] audioBytes)
        {
            var audioStream = new MemoryStream(audioBytes);
            var pushStream = AudioInputStream.CreatePushStream();
            pushStream.Write(audioBytes);
            pushStream.Close();
            var audioConfig = AudioConfig.FromStreamInput(pushStream);
            return audioConfig;
        }
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,069 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Saideep Anchuri 9,500 Reputation points Moderator
    2024-11-08T04:42:18.6466667+00:00

    Hi tzviya langenthal

    Welcome to Microsoft Q&A Forum, thank you for posting your query here!

    To ensure accurate results in pronunciation assessment, you should check that the audio configuration is set up correctly. You can try using AudioConfig.FromStreamInput(audioStream) instead of AudioConfig.FromStreamInput(pushStream). It's also important to make sure that the reference text you're using matches the spoken audio exactly, as any discrepancies can lead to inaccurate results, make sure you're using the latest version of the Azure Speech SDK, as updates often include bug fixes and improvements.

    Kindly refer the below document:

    https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-pronunciation-assessment?pivots=programming-language-csharp

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer

    Thank You.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.