How to retrieve IPA phonemes in Azure AI Services Speech o Text SpeechRecognitionResult

Nick Pattman 40 Reputation points
2023-12-13T09:43:21.66+00:00

Hi,

I'm using the Microsoft.CognitiveServices.Speech (Version 1.33.0) nuget package in csharp to recognise and assess speech from a microphone. I'm doing this using the streaming service with Continuous Recognition (i.e. via StartContinuousRecognitionAsync()).

I'm using the following pronunciation configuration:

            var configuration = new PronunciationAssessmentConfig(
                referenceText: referenceText,
                gradingSystem: GradingSystem.HundredMark,
                granularity: Granularity.Phoneme,
                enableMiscue: true)
            {
                NBestPhonemeCount = 3,
                // PhonemeAlphabet = "IPA", // IPA should work but doesn't
            };

            configuration.EnableProsodyAssessment();

When running, I've attached to the Recognized event, and am inspecting the json result using : speechRecognitionResult.Properties.GetProperty(PropertyId.SpeechServiceResponse_JsonResult);

If I leave the configuration as default (SAPI) the phonemes are returned as expected within a Recognized event, but if I uncomment the PhonemeAlphabet line to set to IPA the phoneme values all come back as empty strings, so, when set to SAPI I get :

{
	"Phoneme": "uh",
	"PronunciationAssessment": {
		"AccuracyScore": 100.0,
		"NBestPhonemes": [{
				"Phoneme": "uh",
				"Score": 93.0
			}, {
				"Phoneme": "r",
				"Score": 80.0
			}, {
				"Phoneme": "er",
				"Score": 63.0
			}
		]
	},
	"Offset": 27600000,
	"Duration": 700000
}

When set to IPA the phonemes all come back as empty strings:

{
	"Phoneme": "",
	"PronunciationAssessment": {
		"AccuracyScore": 100.0,
		"NBestPhonemes": [{
				"Phoneme": "",
				"Score": 93.0
			}, {
				"Phoneme": "",
				"Score": 80.0
			}, {
				"Phoneme": "",
				"Score": 63.0
			}
		]
	},
	"Offset": 27600000,
	"Duration": 700000
}

I'm working in en-GB, but this is also happening with en-US.

The PronunciationAssessmentConfig documentation states that IPA is a valid value for the PhonemeAlphabet property (Pronunciation config documentation) , and if I set it to something nonsensical ("junk") then I get an exception, so IPA is being validated as a correct value.

The fact that the phonemes are just stripped makes me wonder whether this is maybe an character encoding bug/issue - but the classes are so heavily tucked behind interop that it's very difficult to unravel.

Any advice would be much appreciated. Many thanks, Nick

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
{count} votes

Answer accepted by question author
  1. romungi-MSFT 49,061 Reputation points Microsoft Employee Moderator
    2023-12-13T12:11:51.27+00:00

    @Nick Pattman Based on the pronunciation documentation page, the phoneme name and score are only returned from locale "en-US" for other locales only the score is returned. I think confirms the behavior you are seeing with "en-GB"

    For the en-US locale, the phoneme name is provided together with the score, to help identify which phonemes were pronounced accurately or inaccurately. For other locales, you can only get the phoneme score.

    When "IPA" is set with "en-US" I think you should get the name and score though since the documentation example provides a sample result for this case. If you do not see this behavior, I think you should report this issue with the SDK team on this github repo as this could be a bug. Apart from this I do not see an issue in the configuration you are using with pronunciation assessment.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.