Any way to use a custom speech-to-text model with pronunciation assessment?

Amanda 30 Reputation points
2025-06-23T15:31:10.3033333+00:00

Hello,

I have trained a custom speech-to-text model with some data to improve the recognition of disfluencies and hesitation markers like "um" and "uh", and it works pretty well. I also would like to get the pronunciation assessment results for audios that I send to this model, is it possible in some way? I can get the pronunciation results (IPA phoneme-level) using the default model but need it to match up with the transcript from my custom model. It would be ideal to get both at once. FYI I only really need the timestamped phonemes from pronunciation assessment output, I do not need any of the accuracy-related scores.

Thank you!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
{count} votes

Accepted answer
  1. Manas Mohanty 5,620 Reputation points Microsoft External Staff Moderator
    2025-06-23T23:05:39.9433333+00:00

    Hi Amanda,

    I don't think we can use pronunciation assessment with custom models

    Below example script uses only an endpoint and key. Not sure whether we can use Custom Speech model endpoint in SDK here as there is no SDK support for custom speech model yet. (Only CLI, Rest API and Portal)

    https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/scenarios/python/console/language-learning/pronunciation_assessment.py

    config = speechsdk.SpeechConfig(subscription=speech_key, endpoint=speech_endpoint)
    
    

    Reference - https://docs.azure.cn/en-us/ai-services/speech-service/how-to-custom-speech-deploy-model?pivots=speech-studio

    Thank you

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.