I am trying to extract text from speech using Azure Speech to Text service in Python. I am using the endpoint_silence_timeout_ms parameter to set the time to wait for silence before stopping the recognition process. However, it seems that the parameter is not affecting the recognition process at all, as the service stops after a short pause regardless of the value of the parameter.
Here is the code I am using:
import azure.cognitiveservices.speech as speechsdk
import os
def extract_text_from_voice():
speech_key = os.environ.get('SPEECH_KEY')
service_region = os.environ.get('SPEECH_REGION')
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
# Set language to your preferred language code
language = "en-US"
speech_config.speech_recognition_language = language
# Set the end silence timeout to 3.5 seconds
speech_config.endpoint_silence_timeout_ms = 3500
# Create an instance of a speech recognizer
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
# Start microphone input
print("Speak now...")
result = speech_recognizer.recognize_once()
# Return recognized text
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognition status: {}".format(result.reason))
print("Recognized text: {}".format(result.text))
return result.text
elif result.reason == speechsdk.ResultReason.NoMatch:
print("Recognition status: {}".format(result.reason))
print("No speech could be recognized")
elif result.reason == speechsdk.ResultReason.Canceled:
print("Recognition status: {}".format(result.reason))
cancellation_details = result.cancellation_details
print("Error details: {}".format(cancellation_details.error_details))
print("Error reason: {}".format(cancellation_details.reason))
return ""
I have tried different values for the endpoint_silence_timeout_ms parameter (ranging from 500 to 3000), but the recognition process always stops after a short pause (seemingly the same)
Can someone please help me understand why the endpoint_silence_timeout_ms parameter is not working as expected, and how I can make the Speech to Text service wait for a longer pause before stopping the recognition process?
Thank you in advance for your help.