Pronunciation Assessment: Inconsistent Results

Jordan C 20

Hi,

I'm experiencing very inconsistent results with the pronunciation assessment SDK for the same audio file when using different regions.

I have tested the swedencentral and the westeurope regions. I tested them in different, languages, and the results were very inconsistent. You can test yourself by opening two tabs with two different resources in the Speech Studio dashboard and try for yourself with the same text and microphone input.

Could you provide insight into why this is happening and suggest any possible solutions to ensure uniformity in the assessments?

Thank you.

navba-MSFT 24,890 Reputation points Microsoft Employee

2024-07-15T02:31:29.2666667+00:00

@Jordan C Welcome to Microsoft Q&A Forum, Thank you for posting your query here!
.

I will check this from Speech Studio and get back to you with an update.

.
In the meantime, May I know the audio format ? is it WAV ?

Could you please share the audio file you are using ? so that I can test it at my end.

Awaiting your reply.
Jordan C 20 Reputation points

2024-07-15T10:33:42.2833333+00:00

Hey there! I'm using the microphone input directly. You can test it by opening two tabs with different resources in different regions and clicking on the record button for both at the same time and recording the same microphone input and see the inconsistent scores based on the region.

It looks like Azure is not using the same model across regions to evaluate the pronunciation somehow.

Let me know if you see the discrepancy on your side, else I'll create an audio for you.
navba-MSFT 24,890 Reputation points Microsoft Employee

2024-07-15T11:03:34.8833333+00:00

@Jordan C Thanks for clarifying. I tested using the Microsoft input audio from the Speech Studio and I see the below. Is this the discrepancy you are referring to ?

Awaiting your reply.
Jordan C 20 Reputation points

2024-07-15T11:35:40.3466667+00:00

Hello @navba-MSFT

Exactly, but in English the discrepancy is small. In other languages like Russian, or Chinese, it's huge. I noticed the "worse" the pronunciation is, the more discrepancy you have between the scores in different regions for the same microphone input. Take a look:
navba-MSFT 24,890 Reputation points Microsoft Employee

2024-07-16T13:31:22.96+00:00

@Jordan C Thanks for your reply. I am investigating this further. I will get back once I have an update.
Jordan C 20 Reputation points

2024-07-17T21:55:44.3766667+00:00

Okay, @navba-MSFT

Accepted answer

navba-MSFT 24,890 Reputation points Microsoft Employee

2024-07-22T03:32:27.0133333+00:00
@Jordan C Apologies for the late reply.

I had reached out to the product owners and reported this issue. I have heard back from them. Below is their analysis:

.

This issue is by-design.

In different regions, the CPUs (AMD or Intel, serials maybe also are different). If the CPU is the same in 2 regions, the scores should be consistent.

Except en-US, other locales are zero-scales which means if there is no requests for several hours (maybe 6 hours) in this region, the deployment will be released and a fall back model will be triggered. After we monitor there a request for this region, the default model deployment (about 10-15 minutes) for this region will start.

In short, this behavior you are seeing is by design to save the cost.

Unfortunately, this is no public facing documentation on this.

Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

**

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.
Please sign in to rate this answer.
Jordan C 20 Reputation points

2024-07-23T03:04:53.6233333+00:00

Hi @navba-MSFT ,

Yes, I understand. That's unfortunate because it makes the service almost useless in practice (if you refer back to the screenshots I sent, the scores gap is huge). Thank you for the answer though, it makes sense to me that it's for saving computational resources. Maybe you should document it to let people know this might happen, especially if they rely heavily on the scoring output like I did.

navba-MSFT 24,890 Reputation points Microsoft Employee

2024-07-23T03:09:02.95+00:00

@Jordan C Thanks for getting back. I understand and hear your feedback. I will pass on this suggestion to the Product Owners to consider. Appreciate it.
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Pronunciation Assessment: Inconsistent Results

0 additional answers

Your answer