How to get phonemes from azure speech sdk

Doug Bergman 1 Reputation point
2020-12-01T17:55:16.24+00:00

Hi, I am following the Microsoft Azure Speech-to-Text Python sdk tutorial here. I would like to know if there is a way to return the phonemes, an intermediate step in generating the interpreted text. Is that possible? If so, can you please refer me to the documentation and hopefully some sample code that does this. I searched and could not find anything that already answered my question.

Thanks!
Doug

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,451 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,456 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sarthak Agarwal 16 Reputation points
    2021-01-17T11:06:32.167+00:00

    Hi @Doug Bergman

    Is this not what you'r looking for? Sample response for the Phonemes of the word "Thank":

    {
    "Duration": 4700000,
    "Offset": 11500000,
    "Phonemes": [
    {
    "Duration": 2100000,
    "Offset": 11500000,
    "Phoneme": "th",
    "PronunciationAssessment": {
    "AccuracyScore": 100.0
    }
    },
    {
    "Duration": 900000,
    "Offset": 13700000,
    "Phoneme": "ae",
    "PronunciationAssessment": {
    "AccuracyScore": 100.0
    }
    },
    {
    "Duration": 700000,
    "Offset": 14700000,
    "Phoneme": "ng",
    "PronunciationAssessment": {
    "AccuracyScore": 100.0
    }
    },
    {
    "Duration": 700000,
    "Offset": 15500000,
    "Phoneme": "k",
    "PronunciationAssessment": {
    "AccuracyScore": 100.0
    }
    }
    ],
    "PronunciationAssessment": {
    "AccuracyScore": 100.0,
    "ErrorType": "None"
    },
    "Word": "Thank"
    }


  2. YutongTie-MSFT 46,996 Reputation points
    2021-01-28T17:27:41.22+00:00

    Hello Doug,

    This is

    Pronunciation assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practice, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. Educators can use the capability to evaluate pronunciation of multiple speakers in real-time.

    But the pronunciation assessment feature is currently only available in regions westus, eastasia and centralindia, and only supports language en-US.

    Please refer to following sample code for how to set up and retrieve.

    https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/cpp/windows/console/samples/speech_recognition_samples.cpp#L633

    Regards,
    Yutong

    0 comments No comments