Dynamic Speech to Text

Sakib Ali Choudhary 120 Reputation points
2023-11-03T14:34:25.7166667+00:00

Hi,

I am looking a way where we can use Azure Speech in a dynamic way meaning if i ask a question in French then it should give me the response of speech to text in French and if i ask a question in Dutch then it should give the response in Dutch without explicitly specifying the language like en,fr,du etc. And i should be working in real time (microphone).

This is my current code :-

try:
    # Set up the Speech Configuration using the provided subscription key and service region
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    # Set the timeout for conversation ending detection to 300 seconds (5 minutes)
    conversation_ending_detection_timeout = 300
    speech_config.set_service_property("speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs", str(conversation_ending_detection_timeout * 1000), speechsdk.ServicePropertyChannel.UriQueryParameter)

    # Set the segmentation silence timeout to 5000 milliseconds (5 seconds)
    speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "5000")

    # Create a SpeechRecognizer object with the configured SpeechConfig
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
    speech_recognizer.language_code = None

    # Prompt the user to say something
    print("Say something...")

    # Perform speech recognition and get the result
    result = speech_recognizer.recognize_once()

    # Check the reason for the result
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        # If speech was recognized, print the recognized text
        print("Recognized: {}".format(result.text))
    elif result.reason == speechsdk.ResultReason.NoMatch:
        # If no speech could be recognized, print details about the recognition failure
        print("No speech could be recognized: {}".format(result.no_match_details))
    elif result.reason == speechsdk.ResultReason.Canceled:
        # If speech recognition was canceled, print the cancellation reason and details
        cancellation_details = result.cancellation_details
        print("Speech Recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            # If the cancellation reason was an error, print the error details
            print("Error details: {}".format(cancellation_details.error_details))

except Exception as ex:
    # Catch and handle any exceptions that may occur during the execution
    print("An error occurred: {}".format(ex))

Could anyone please provide any suggestions or docs which will be useful in this.
Thanks

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,139 questions
{count} votes

Accepted answer
  1. dupammi 1,210 Reputation points Microsoft Vendor
    2023-11-03T18:03:27.0866667+00:00

    Hi @Sakib Ali Choudhary ,

    Thank you for contacting Microsoft Q&A.

    I understand that you wanted to achieve real-time speech-to-text transcription in multiple languages. I will be happy to assist you with this.

    To implement dynamic language detection and real-time speech-to-text conversion using Azure Speech, you can use Azure Cognitive Services Speech SDK with the continuous translation feature and AutoDetectSourceLanguageConfig.

    For more information and detailed documentation, you can refer to the below Azure Speech SDK documentation.

    Speech translation quickstart - Speech service - Azure AI services | Microsoft Learn

    Language identification - Speech service - Azure AI services | Microsoft Learn

    Along with above docs, here is the sample code I used to reproduce the real-time dynamic language detection and speech-to-text conversion. The code listens for speech input, detects the language being spoken, and provides translations in multiple languages. Modify the code as per your requirements.

    import azure.cognitiveservices.speech as speechsdk
    
    speech_key, service_region = "YOUR_SPEECH_KEY","YOUR_SPEECH_REGION"
    
    def continuous_translation_from_microphone():
        translation_config = speechsdk.translation.SpeechTranslationConfig(
            subscription=speech_key,
            region=service_region,
            speech_recognition_language='en-US')
        
        audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)  # Use the default microphone
    
        # Add target languages to the translation configuration
        translation_config.add_target_language("de")
        translation_config.add_target_language("fr")
        translation_config.add_target_language("hi")
    
        recognizer = speechsdk.translation.TranslationRecognizer(
            translation_config=translation_config,
            audio_config=audio_config)
    
        print("Speak something... ")
    
        try:
            while True:
                result = recognizer.recognize_once()
    
                # Check the result
                if result.reason == speechsdk.ResultReason.TranslatedSpeech:
                    print(f"Recognized: {result.text}")
                    print(f"German translation: {result.translations.get('de', '')}")
                    print(f"French translation: {result.translations.get('fr', '')}")
                    print(f"Hindi translation: {result.translations.get('hi', '')}")
                elif result.reason == speechsdk.ResultReason.RecognizedSpeech:
                    print("Recognized: {}".format(result.text))
                    detectedSrcLang = result.properties[speechsdk.PropertyId.SpeechServiceConnection_AutoDetectSourceLanguageResult]
                    print("Detected Language: {}".format(detectedSrcLang))
                elif result.reason == speechsdk.ResultReason.NoMatch:
                    print("No speech could be recognized: {}".format(result.no_match_details))
                elif result.reason == speechsdk.ResultReason.Canceled:
                    print("Translation canceled: {}".format(result.cancellation_details.reason))
                    if result.cancellation_details.reason == speechsdk.CancellationReason.Error:
                        print("Error details: {}".format(result.cancellation_details.error_details))
        except KeyboardInterrupt:
            print("Recognition stopped.")
    
    def main():
        continuous_translation_from_microphone()
    
    if __name__ == "__main__":
        main()
    

    Output

    Speak something... (Press Ctrl+C to stop)
    Recognized: How are you?
    German translation: Wie geht es dir?
    French translation: Comment vas-tu?
    Hindi translation: तुम कैसे हो?
    Recognized: I am fine.
    German translation: Es geht mir gut.
    French translation: Je vais bien.
    Hindi translation: मैं बढ़िया हूँ।
    Recognition stopped.
    

    Thank You!

    Hope this helps.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful