[Pronunciation Assessment] Is there a way to improve the results using a custom model?

Francis W 0 Reputation points
2024-10-03T16:48:35.49+00:00

I've been experimenting with the pronunciation assessment service. My use case involves scripted assessment in Canadian French (i.e., fr-CA). So far, I've had the most success with the configuration where enableMiscue is set to false, as I have found the results to be easier to interpret and more consistent than when the option is turned on. Ideally, I would like to get phoneme granularity, but from what I've read in the docs, that option is only available for en-US.

I've also experimented with the en-US dialect, and I must say that I'm impressed with the results. Unfortunately, I can't say the same for fr-CA. It's not that the latter is "bad", but from what I've seen, it's not quite ready for "prime time." While it performs well overall, it fails in a lot of situations. At some point, I considered that there might be something wrong with my French, but I double-checked using AI-synthesized audio, and the service fails at the same places. There are numerous examples, and my goal is not to list them all here, but the service systematically fails to detect the word dehors. In other cases, the service fails for plural words (e.g., parents fails, while parent works fine).

That being said, I would really like to use the pronunciation assessment service for my use case, so I'm looking for workarounds. I read about the custom speech service, and I was wondering whether it would be possible to train custom speech models to be used in the context of pronunciation assessment. At the very least, it would be nice to have a way to report inaccuracies in the returned assessments so that the fr-CA model could be updated. I'm also open to any other recommendations that could make the fr-CA model work better for me.

Thank you very much for your time and support.

Francis

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,776 questions
{count} votes

1 answer

Sort by: Most helpful
  1. SriLakshmi C 510 Reputation points Microsoft Vendor
    2024-10-08T12:11:54.9633333+00:00

    Hello Francis W,Unfortunately, you cannot currently create a custom speech model to be directly integrated into the pronunciation assessment service for Canadian French (fr-CA). The pronunciation assessment service does not support the use of custom speech models for improving the accuracy of phoneme or word-level assessments. The custom speech service, while useful for improving speech-to-text transcriptions in various dialects and languages, does not extend to fine-tuning the pronunciation assessment feature itself.

    However, you can leverage the custom speech service to build models that better recognize and transcribe Canadian French in your specific use case. While this won’t directly enhance the pronunciation assessment, it could provide better transcription and recognition accuracy as part of the overall language processing pipeline.

    Thank you!

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.