How to apply speech recognition confidence for ARABIC Language

Shebl Albarazi 0 Reputation points
2023-03-02T12:16:20.6133333+00:00

Good day... I am using Azure speech recognition for arabic language. I am struggling to make the recognition easier. When applying the azure speech recognition, the recognized sentence needs to be exactly as preset one,which is affecting the smoothness of the recognition so we are forced to resaw the sentence twice or three time even more to match the exact preset .....again I am using arabic language ....I need help for that ...

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,410 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,409 questions
{count} votes

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 14,576 Reputation points
    2023-03-13T19:18:23.07+00:00

    Hi @Shebl Albarazi , Thanks for using Microsoft Q&A Platform.

    Sorry to hear that you experience this. As we know, Speech-to-text supports a variety of language locales and voices. Can you be more specific about the Arabic locale(ar-**)?

    If you haven't already, I advise you to try this using detailed output format and check the results of the recognition to see if it helps. https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.outputformat?view=azure-python

    Here is the sample code to check for the detailed recognition output format:

    speech_config.output_format = speechsdk.OutputFormat.Detailed
    speech_recognizer = speechsdk.SpeechRecognizer(
            speech_config=speech_config, language="en-US", audio_config=audio_config)
    result = speech_recognizer.recognize_once()
    

    From the result:

    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("Recognized: {}".format(result.text))
    
            # Time units are in hundreds of nanoseconds (HNS), where 10000 HNS equals 1 millisecond
            print("Offset: {}".format(result.offset))
            print("Duration: {}".format(result.duration))
    
            # Now get the detailed recognition results from the JSON
            json_result = json.loads(result.json)
    
            # The first cell in the NBest list corresponds to the recognition results
            # (NOT the cell with the highest confidence number!)
            print("Detailed results - Lexical: {}".format(json_result['NBest'][0]['Lexical']))
            # ITN stands for Inverse Text Normalization
            print("Detailed results - ITN: {}".format(json_result['NBest'][0]['ITN']))
            print("Detailed results - MaskedITN: {}".format(json_result['NBest'][0]['MaskedITN']))
            print("Detailed results - Display: {}".format(json_result['NBest'][0]['Display']))
    
    

    You can find full code related to detailed output format here: GitHub code: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py

    Is it possible to share the sample input speech you are trying, expected text, and output recognition by speech details so that we can test them on our end?

    Regards,
    Vasavi

    0 comments No comments