Azure Cognitive Services Speech to Text get Confidence Scores during Recognizing Event

Technicator 5 Reputation points
2023-10-29T18:04:54.4566667+00:00

I'm building a phone chatbot that's using Azure's Speech To Text service, and while it's good, one missing feature I would like is getting confidence scores for partial transcriptions during the Recognizing callback: https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-speech-recognition-results?pivots=programming-language-javascript

Right now, I can only can get confidence scores during the Recognized event, via e.result.json.NBest. This is after I set the speechConfig outputFormat to 1 and setting wordLevelConfidence to true for service property.

The reason I need the confidence scores during Recognizing event is that we want to detect ASAP when the user is speaking so we can interrupt the phone chatbot mid-sentence. Without confidence scores, it's difficult to know when the user is actually speaking or the speech to text service is just picking up background noise (e.g. radio or ambient conversation noises)

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,734 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,857 questions
{count} vote

1 answer

Sort by: Most helpful
  1. dupammi 8,465 Reputation points Microsoft Vendor
    2023-10-30T02:21:49.94+00:00

    Hi @Technicator ,

    Thank you for reaching out to Microsoft Q&A with your question regarding confidence scores in Azure's Speech to Text service.

    You are on the right track. To obtain confidence scores for recognized events, you should configure the SpeechConfig with the appropriate output format. Specifically, set OutputFormat to OutputFormat.Detailed. Once configured, you can access the best possible recognitions, including the Confidence Score, using the Result.Best() method.

    please refer the following Speech recognition samples that can help you.

    Dealing with non-speech noise can be challenging. According to official documentation, it's recommended to ensure the user tries again or uses better recording conditions to avoid recognition of noise as speech. If this cannot be avoided, you can base it off the confidence score, but there is no guidance on the limit or cutoff. Depending on the quality of the speech, you could decide to ignore text below a certain threshold and can determine when to interrupt the chatbot.

    Here are some Speech recognition samples that can help you:

    https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstarts/speech-to-text-from-microphone?tabs=dotnet%2Cwindowsinstall&pivots=programming-language-javascript

    Hope this helps.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.