How to apply speech recognition confidence for ARABIC Language

Shebl Albarazi 0 Reputation points

Good day... I am using Azure speech recognition for arabic language. I am struggling to make the recognition easier. When applying the azure speech recognition, the recognized sentence needs to be exactly as preset one,which is affecting the smoothness of the recognition so we are forced to resaw the sentence twice or three time even more to match the exact preset .....again I am using arabic language ....I need help for that ...

Azure Speech
Azure Speech
An Azure service that integrates speech processing into apps and services.
832 questions
Azure Cognitive Services
Azure Cognitive Services
A group of Azure artificial intelligence services and cognitive APIs that help build intelligent apps.
1,008 questions
{count} votes

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 1,996 Reputation points

    Hi @Shebl Albarazi , Thanks for using Microsoft Q&A Platform.

    Sorry to hear that you experience this. As we know, Speech-to-text supports a variety of language locales and voices. Can you be more specific about the Arabic locale(ar-**)?

    If you haven't already, I advise you to try this using detailed output format and check the results of the recognition to see if it helps.

    Here is the sample code to check for the detailed recognition output format:

    speech_config.output_format = speechsdk.OutputFormat.Detailed
    speech_recognizer = speechsdk.SpeechRecognizer(
            speech_config=speech_config, language="en-US", audio_config=audio_config)
    result = speech_recognizer.recognize_once()

    From the result:

    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("Recognized: {}".format(result.text))
            # Time units are in hundreds of nanoseconds (HNS), where 10000 HNS equals 1 millisecond
            print("Offset: {}".format(result.offset))
            print("Duration: {}".format(result.duration))
            # Now get the detailed recognition results from the JSON
            json_result = json.loads(result.json)
            # The first cell in the NBest list corresponds to the recognition results
            # (NOT the cell with the highest confidence number!)
            print("Detailed results - Lexical: {}".format(json_result['NBest'][0]['Lexical']))
            # ITN stands for Inverse Text Normalization
            print("Detailed results - ITN: {}".format(json_result['NBest'][0]['ITN']))
            print("Detailed results - MaskedITN: {}".format(json_result['NBest'][0]['MaskedITN']))
            print("Detailed results - Display: {}".format(json_result['NBest'][0]['Display']))

    You can find full code related to detailed output format here: GitHub code:

    Is it possible to share the sample input speech you are trying, expected text, and output recognition by speech details so that we can test them on our end?