endpoint_silence_timout_ms speech to text in python problem

Question

I am trying to extract text from speech using Azure Speech to Text service in Python. I am using the endpoint_silence_timeout_ms parameter to set the time to wait for silence before stopping the recognition process. However, it seems that the parameter is not affecting the recognition process at all, as the service stops after a short pause regardless of the value of the parameter. Here is the code I am using:


import azure.cognitiveservices.speech as speechsdk
import os

def extract_text_from_voice():
    speech_key = os.environ.get('SPEECH_KEY')
    service_region = os.environ.get('SPEECH_REGION')
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    # Set language to your preferred language code
    language = "en-US"
    speech_config.speech_recognition_language = language
    
    # Set the end silence timeout to 3.5 seconds
    speech_config.endpoint_silence_timeout_ms = 3500

    # Create an instance of a speech recognizer
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

    # Start microphone input
    print("Speak now...")
    result = speech_recognizer.recognize_once()

    # Return recognized text
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognition status: {}".format(result.reason))
        print("Recognized text: {}".format(result.text))
        return result.text
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print("Recognition status: {}".format(result.reason))
        print("No speech could be recognized")
    elif result.reason == speechsdk.ResultReason.Canceled:
        print("Recognition status: {}".format(result.reason))
        cancellation_details = result.cancellation_details
        print("Error details: {}".format(cancellation_details.error_details))
        print("Error reason: {}".format(cancellation_details.reason))
    return ""

I have tried different values for the endpoint_silence_timeout_ms parameter (ranging from 500 to 3000), but the recognition process always stops after a short pause (seemingly the same) Can someone please help me understand why the endpoint_silence_timeout_ms parameter is not working as expected, and how I can make the Speech to Text service wait for a longer pause before stopping the recognition process? Thank you in advance for your help.

Answer

@Jonas Hagberth In this case Speech_SegmentationSilenceTimeoutMs property is the one that needs to be changed rather than SpeechServiceConnection_EndSilenceTimeoutMs.

Here, is the link where you can find more details. https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-recognize-speech?pivots=programming-language-csharp . This part is missing on python sdk, you will need to see on C#. On last paragraph you can see how to change silence handling.

Code sample is shown below:

speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "3000")

Share via

endpoint_silence_timout_ms speech to text in python problem

1 answer