Speech_SegmentationSilenceTimeoutMs and speech segmentation

Question

Speech_SegmentationSilenceTimeoutMs and speech segmentation

Domenico Zurlo 1

Dear Azure Technical Support,

I'm using the Azure Speech Service for continuous speech recognition and I've encountered a behavior that I'd like to clarify. Historically, when using the continuous recognition mode, the service segmented the audio into 15-second chunks. However, after adjusting the Speech_SegmentationSilenceTimeoutMs parameter to 3500ms, I noticed this 15-second segmentation behavior seems to have changed. Indeed I receive the error: "ServiceTimeout, Due to service inactivity the client buffer size exceeded" when I create a long segment without silence pause.

Can you confirm if adjusting the Speech_SegmentationSilenceTimeoutMs parameter impacts the expected 15-second segmentation? If so, can you provide more details on how these settings interact, or point me to the relevant documentation?

I'd appreciate any insights or guidance on this matter. I would like to avoid the error.

Thank you for your assistance.

Best regards,

Domenico.

santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2023-10-20T11:20:06.5366667+00:00
Hi @Domenico Zurlo ,

Thanks for reaching out to Microsoft Q&A forum!

Based on your query, that the Speech_SegmentationSilenceTimeoutMs parameter adjusted to 3500ms and noticed that the 15-second segmentation behaviour seems to have changed, resulting in the error.

Speech_SegmentationSilenceTimeoutMs is a timeout can be set in between the phrases and SpeechServiceConnection_InitialSilenceTimeoutMs is a timeout that starts before the phrase is even started. So, setting the first timeout to a value will not affect unless the initial silence timeout since it waits for its default time of 15000ms to complete or a speech to be recognized.

When modifying these timeouts, you should only change the settings when you have a problem related to silence handling.

Example: Users speaking a serial number like "ABC-123-4567" might pause between character groups long enough for the serial number to be broken into multiple results. In this case, try a higher value like 2000 ms for the segmentation silence timeout:

speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "2000")

Example: A recorded presenter's speech might be fast enough that several sentences in a row get combined, with large recognition results only arriving once or twice per minute. In this case, set the segmentation silence timeout to a lower value like 300 ms:

speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "300")

Example: A single-shot recognition asking a speaker to find and read a serial number ends too quickly while the number is being found. In this case, try a longer initial silence timeout like 10,000 ms:

speech_config.set_property("SpeechServiceConnection_InitialSilenceTimeoutMs", "10000")

For more info: How to recognize speech - Speech service - Azure AI services | Microsoft Learn

I hope you understand. Thank you.
santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2023-10-24T04:36:42.8333333+00:00

Hi @Domenico Zurlo ,

Following up to see if the given information was helpful. And, if you have any further query do let us know. Thank you.
santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2023-10-25T07:37:14.6533333+00:00

Hi @Domenico Zurlo ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Thank you.
Domenico Zurlo 1 Reputation point

2023-10-25T07:47:27.3+00:00

Hi @santoshkc ,

I didn't understand how can I solve the issue.

With the modification Speech_SegmentationSilenceTimeoutMs parameter to 3500ms, needed to give the possibility to the user to speak a serial number, I receive the error: "ServiceTimeout, Due to service inactivity the client buffer size exceeded", when a long speech is done. This case does not happen when the Speech_SegmentationSilenceTimeoutMs parameter is not setted. Could you give me more help?

Thanks,

Domenico.

santoshkc 15,355 Microsoft External Staff Moderator

Hi @Domenico Zurlo ,

Sorry for the trouble you are facing. Please try using the below code. I ran the below code in my local Jupyter Notebook without any buffer error. The below code uses 3500ms for Speech_SegmentationSilenceTimeoutMs

# Set up the speech configuration
speech_key = "<SPEECH-KEY>"
service_region = "SERVICE-REGION>"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Set the Speech_SegmentationSilenceTimeoutMs parameter to 3500ms
speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "3500")

# Set up the audio configuration
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)

# Set up the speech recognizer
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

# Set the request timeout property
speech_config.set_service_property(name="requesttimeout", value="20000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)

# Start continuous recognition
done = False

def stop_cb(evt):
    """Callback that stops continuous recognition upon receiving an event `evt`"""
    print('CLOSING on {}'.format(evt))
    speech_recognizer.stop_continuous_recognition()
    global done
    done = True

speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
speech_recognizer.start_continuous_recognition()

while not done:
    pass

# Listen to the recognized event and check the duration property
for result in speech_recognizer.session_results:
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(result.text))
        print("Duration: {} seconds".format(result.duration))
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized")
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))

Output snippet sample: User's image

I hope this helps you resolve the issue. Thank you.

2 answers

Your answer

santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2023-10-24T04:36:42.8333333+00:00

Hi @Domenico Zurlo ,

Following up to see if the given information was helpful. And, if you have any further query do let us know. Thank you.
santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2023-10-25T07:37:14.6533333+00:00

Hi @Domenico Zurlo ,

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Thank you.
Domenico Zurlo 1 Reputation point

2023-10-25T07:47:27.3+00:00

Hi @santoshkc ,

I didn't understand how can I solve the issue.

With the modification Speech_SegmentationSilenceTimeoutMs parameter to 3500ms, needed to give the possibility to the user to speak a serial number, I receive the error: "ServiceTimeout, Due to service inactivity the client buffer size exceeded", when a long speech is done. This case does not happen when the Speech_SegmentationSilenceTimeoutMs parameter is not setted. Could you give me more help?

Thanks,

Domenico.

Answer 1

Hi @Domenico Zurlo ,

Following up to see my above "comment" answer helps by checking the comments section of this thread. Do let us know if you have any queries.

To reiterate the resolution here, let me jot down the gist of my comment answer above.

I understand that you are adjusted the Speech_SegmentationSilenceTimeoutMs parameter to 3500ms and noticed an error: "ServiceTimeout, Due to service inactivity the client buffer size exceeded", when a long speech is done.

I ran the below code in my local Jupyter Notebook without any buffer error. The below code uses 3500ms for Speech_SegmentationSilenceTimeoutMs

# Set up the speech configuration
speech_key = "<SPEECH-KEY>"
service_region = "SERVICE-REGION>"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Set the Speech_SegmentationSilenceTimeoutMs parameter to 3500ms
speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "3500")

# Set up the audio configuration
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)

# Set up the speech recognizer
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

# Set the request timeout property
speech_config.set_service_property(name="requesttimeout", value="20000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)

# Start continuous recognition
done = False

def stop_cb(evt):
    """Callback that stops continuous recognition upon receiving an event `evt`"""
    print('CLOSING on {}'.format(evt))
    speech_recognizer.stop_continuous_recognition()
    global done
    done = True

speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
speech_recognizer.start_continuous_recognition()

while not done:
    pass

# Listen to the recognized event and check the duration property
for result in speech_recognizer.session_results:
    if result.reason == speechsdk.ResultReason.RecognizedSpeech:
        print("Recognized: {}".format(result.text))
        print("Duration: {} seconds".format(result.duration))
    elif result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech could be recognized")
    elif result.reason == speechsdk.ResultReason.Canceled:
        cancellation_details = result.cancellation_details
        print("Speech recognition canceled: {}".format(cancellation_details.reason))
        if cancellation_details.reason == speechsdk.CancellationReason.Error:
            print("Error details: {}".format(cancellation_details.error_details))

Please have a look at the sample implementation done in the above "comment".

Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics. Thank you!

Sam Byng 0 Reputation points Microsoft Employee

2024-04-24T15:18:54.6733333+00:00

Hi Santosh,

What is the requesttimeout property?

I cannot see this documented anywhere in the list at: https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech.propertyid?view=azure-dotnet

Do you have any recommendations for what to do with Continuous Recognition, where we want to avoid the case that a user could in theory continuously talk with silence intervals <300ms, their speech would never get parsed.

Is there a config option to control the maximum length of speech, to avoid this issue?

Many thanks
santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2024-04-25T01:52:59.72+00:00

Hi @Sam Byng,

Since this thread is too old, I would recommend creating a new thread on the same forum with as much details about your issue as possible. That would make sure that your issue has better visibility in the community.

Answer 2

Domenico Zurlo 1

Hi @santoshkc ,

I use .net pushStream as you can see below. Could be the stream the source of the issue?

            using (var pushStream = AudioInputStream.CreatePushStream())
            {
                using (var audioConfig = AudioConfig.FromStreamInput(pushStream))
                {
                    using (var speechClient = new SpeechRecognizer(speechConfig, audioConfig))
                    {

santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2023-10-27T14:12:05.6333333+00:00
Hi @Domenico Zurlo ,

Thank you for your response.

In your given code, you used the AudioInputStream.CreatePushStream(). The push stream allows you to push audio data to the speech service as it becomes available, making it suitable for continuous recognition with real-time audio input.

The error message "ServiceTimeout, due to service inactivity the client buffer size exceeded" suggests that the service expects some silence to occur within the Speech_SegmentationSilenceTimeoutMs duration, but the push stream might not be providing enough audio data with the required silence pauses.

To fulfill your requirement while using CreatePushStream, you may need to ensure that you are pushing audio data with appropriate silence pauses to avoid exceeding the client buffer size and encountering a service timeout.

If you want to work with larger chunks of audio data without needing to manage the silence pauses manually, you could use a continuous recognition mode without push streaming. Here's an alternative approach:

using (var audioConfig = AudioConfig.FromDefaultMicrophoneInput()) { using (var speechClient = new SpeechRecognizer(speechConfig, audioConfig)) { // Your code for continuous recognition without CreatePushStream } }

This code uses the default microphone input and should handle audio segmentation and silence pauses automatically.

Please try these steps and it might help to resolve the issue. Thank you.
Domenico Zurlo 1 Reputation point

2023-10-31T16:39:46.7633333+00:00

Hi @santoshkc ,

but the issue appears only when the Speech_SegmentationSilenceTimeoutMs parameter is setted to 3500. When this parameter is setted, the silence is present but the sentence is not segmented bacause of the parameter value. I expected that when 15 seconds passed the sentence would segment correctly.

Thanks,

Domenico.

Share via

Speech_SegmentationSilenceTimeoutMs and speech segmentation

2 answers

Your answer