Dynamic Speech to Text

Question

Dynamic Speech to Text

Sakib Ali Choudhary 225

Hi,

I am looking a way where we can use Azure Speech in a dynamic way meaning if i ask a question in French then it should give me the response of speech to text in French and if i ask a question in Dutch then it should give the response in Dutch without explicitly specifying the language like en,fr,du etc. And i should be working in real time (microphone).

This is my current code :-

try:
    # Set up the Speech Configuration using the provided subscription key and service region
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    # Set the timeout for conversation ending detection to 300 seconds (5 minutes)
    conversation_ending_detection_timeout = 300
    speech_config.set_service_property("speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs", str(conversation_ending_detection_timeout * 1000), speechsdk.ServicePropertyChannel.UriQueryParameter)

    # Set the segmentation silence timeout to 5000 milliseconds (5 seconds)
    speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "5000")

    # Create a SpeechRecognizer object with the configured SpeechConfig
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
    speech_recognizer.language_code = None

    # Prompt the user to say something
    print("Say something...")

    # Perform speech recognition and get the result
    result = speech_recognizer.recognize_once()

    # Check the reason for the result
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        # If speech was recognized, print the recognized text
        print("Recognized: {}".format(result.text))
    elif result.reason == speechsdk.ResultReason.NoMatch:
        # If no speech could be recognized, print details about the recognition failure
        print("No speech could be recognized: {}".format(result.no_match_details))
    elif result.reason == speechsdk.ResultReason.Canceled:
        # If speech recognition was canceled, print the cancellation reason and details
        cancellation_details = result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            # If the cancellation reason was an error, print the error details
            print("Error details: {}".format(cancellation_details.error_details))

except Exception as ex:
    # Catch and handle any exceptions that may occur during the execution
    print("An error occurred: {}".format(ex))

Could anyone please provide any suggestions or docs which will be useful in this.
Thanks

Sakib Ali Choudhary 225

For those who want to take a example code for dynamic speech to text where the languages gets automatically chosen and we get the response in that language. The code as below :-

try:
    # Set up the Speech Configuration using the provided subscription key and service region
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    # Set the timeout for conversation ending detection to 300 seconds (5 minutes)
    conversation_ending_detection_timeout = 300
    speech_config.set_service_property("speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs", str(conversation_ending_detection_timeout * 1000), speechsdk.ServicePropertyChannel.UriQueryParameter)

    # Set the segmentation silence timeout to 5000 milliseconds (5 seconds)
    speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "5000")

    # Create a SpeechRecognizer object with the configured SpeechConfig
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
    auto_detect_source_language_config = \
            speechsdk.languageconfig.AutoDetectSourceLanguageConfig(languages=["en-US", "de-DE","gu-IN","hi-IN"])
    speech_recognizer = speechsdk.SpeechRecognizer(
            speech_config=speech_config, 
            auto_detect_source_language_config=auto_detect_source_language_config)

    # Prompt the user to say something
    print("Say something...")

    # Perform speech recognition and get the result
    result = speech_recognizer.recognize_once()

    # Check the reason for the result
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        # If speech was recognized, print the recognized text
        auto_detect_source_language_result = speechsdk.AutoDetectSourceLanguageResult(result)
        detected_language = auto_detect_source_language_result.language
        print("Recognized: {}".format(result.text))
        print("Detected language: {}".format(detected_language))
    elif result.reason == speechsdk.ResultReason.NoMatch:
        # If no speech could be recognized, print details about the recognition failure
        print("No speech could be recognized: {}".format(result.no_match_details))
    elif result.reason == speechsdk.ResultReason.Canceled:
        # If speech recognition was canceled, print the cancellation reason and details
        cancellation_details = result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            # If the cancellation reason was an error, print the error details
            print("Error details: {}".format(cancellation_details.error_details))

except Exception as ex:
    # Catch and handle any exceptions that may occur during the execution
    print("An error occurred: {}".format(ex))

Example output :-

Say something...
Recognized: ગાઝા હોસ્પિટલ પાસે એમ્બ્યુલન્સ પર હુમલો કર્યો હુમલામાં 15 લોકોના મોત થયા જ્યારે 60 થી વધુ લોકો ઘાયલ થયા હોવાનું જણાવ્યું છે. આ આતંકી છુપા માટે હોસ્પિટલ.
Detected language: gu-IN

And at the last thanks @dupammi for the docs and suggestion it helped me a lot.

Accepted answer

0 additional answers

Your answer

Answer 1

Hi @Sakib Ali Choudhary ,

Thank you for contacting Microsoft Q&A.

I understand that you wanted to achieve real-time speech-to-text transcription in multiple languages. I will be happy to assist you with this.

To implement dynamic language detection and real-time speech-to-text conversion using Azure Speech, you can use Azure Cognitive Services Speech SDK with the continuous translation feature and AutoDetectSourceLanguageConfig.

For more information and detailed documentation, you can refer to the below Azure Speech SDK documentation.

Speech translation quickstart - Speech service - Azure AI services | Microsoft Learn

Language identification - Speech service - Azure AI services | Microsoft Learn

Along with above docs, here is the sample code I used to reproduce the real-time dynamic language detection and speech-to-text conversion. The code listens for speech input, detects the language being spoken, and provides translations in multiple languages. Modify the code as per your requirements.

import azure.cognitiveservices.speech as speechsdk

speech_key, service_region = "YOUR_SPEECH_KEY","YOUR_SPEECH_REGION"

def continuous_translation_from_microphone():
    translation_config = speechsdk.translation.SpeechTranslationConfig(
        subscription=speech_key,
        region=service_region,
        speech_recognition_language='en-US')
    
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)  # Use the default microphone

    # Add target languages to the translation configuration
    translation_config.add_target_language("de")
    translation_config.add_target_language("fr")
    translation_config.add_target_language("hi")

    recognizer = speechsdk.translation.TranslationRecognizer(
        translation_config=translation_config,
        audio_config=audio_config)

    print("Speak something... ")

    try:
        while True:
            result = recognizer.recognize_once()

            # Check the result
            if result.reason == speechsdk.ResultReason.TranslatedSpeech:
                print(f"Recognized: {result.text}")
                print(f"German translation: {result.translations.get('de', '')}")
                print(f"French translation: {result.translations.get('fr', '')}")
                print(f"Hindi translation: {result.translations.get('hi', '')}")
            elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
                print("Recognized: {}".format(result.text))
                detectedSrcLang = result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
                print("Detected Language: {}".format(detectedSrcLang))
            elif result.reason == speechsdk.ResultReason.NoMatch:
                print("No speech could be recognized: {}".format(result.no_match_details))
            elif result.reason == speechsdk.ResultReason.Canceled:
                print("Translation canceled: {}".format(result.cancellation_details.reason))
                if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
                    print("Error details: {}".format(result.cancellation_details.error_details))
    except KeyboardInterrupt:
        print("Recognition stopped.")

def main():
    continuous_translation_from_microphone()

if __name__ == "__main__":
    main()

Output

Speak something... (Press Ctrl+C to stop)
Recognized: How are you?
German translation: Wie geht es dir?
French translation: Comment vas-tu?
Hindi translation: तुम कैसे हो?
Recognized: I am fine.
German translation: Es geht mir gut.
French translation: Je vais bien.
Hindi translation: मैं बढ़िया हूँ।
Recognition stopped.

Thank You!

Hope this helps.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Share via

Dynamic Speech to Text

0 additional answers

Your answer