How to apply speech recognition confidence for ARABIC Language

Question

How to apply speech recognition confidence for ARABIC Language

Shebl Albarazi 0

Good day... I am using Azure speech recognition for arabic language. I am struggling to make the recognition easier. When applying the azure speech recognition, the recognized sentence needs to be exactly as preset one,which is affecting the smoothness of the recognition so we are forced to resaw the sentence twice or three time even more to match the exact preset .....again I am using arabic language ....I need help for that ...

VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-03-04T02:09:01.9666667+00:00

@Shebl Albarazi , did you get a chance to check my response?
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-03-08T03:34:02.73+00:00

@Shebl Albarazi , anything more you are looking help for?

1 answer

Your answer

VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-03-04T02:09:01.9666667+00:00

@Shebl Albarazi , did you get a chance to check my response?
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2023-03-08T03:34:02.73+00:00

@Shebl Albarazi , anything more you are looking help for?

Answer 1

Hi @Shebl Albarazi , Thanks for using Microsoft Q&A Platform.

Sorry to hear that you experience this. As we know, Speech-to-text supports a variety of language locales and voices. Can you be more specific about the Arabic locale(ar-**)?

If you haven't already, I advise you to try this using detailed output format and check the results of the recognition to see if it helps. https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.outputformat?view=azure-python

Here is the sample code to check for the detailed recognition output format:

speech_config.output_format = speechsdk.OutputFormat.Detailed
speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config, language="en-US", audio_config=audio_config)
result = speech_recognizer.recognize_once()

From the result:

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(result.text))

        # Time units are in hundreds of nanoseconds (HNS), where 10000 HNS equals 1 millisecond
        print("Offset: {}".format(result.offset))
        print("Duration: {}".format(result.duration))

        # Now get the detailed recognition results from the JSON
        json_result = json.loads(result.json)

        # The first cell in the NBest list corresponds to the recognition results
        # (NOT the cell with the highest confidence number!)
        print("Detailed results - Lexical: {}".format(json_result['NBest'][0]['Lexical']))
        # ITN stands for Inverse Text Normalization
        print("Detailed results - ITN: {}".format(json_result['NBest'][0]['ITN']))
        print("Detailed results - MaskedITN: {}".format(json_result['NBest'][0]['MaskedITN']))
        print("Detailed results - Display: {}".format(json_result['NBest'][0]['Display']))

You can find full code related to detailed output format here: GitHub code: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py

Is it possible to share the sample input speech you are trying, expected text, and output recognition by speech details so that we can test them on our end?

Regards,
Vasavi

Share via

How to apply speech recognition confidence for ARABIC Language

1 answer

Your answer