Speech_SegmentationSilenceTimeoutMs and speech segmentation

Domenico Zurlo 1 Reputation point
2023-10-19T15:24:23.02+00:00

Dear Azure Technical Support,

I'm using the Azure Speech Service for continuous speech recognition and I've encountered a behavior that I'd like to clarify. Historically, when using the continuous recognition mode, the service segmented the audio into 15-second chunks. However, after adjusting the Speech_SegmentationSilenceTimeoutMs parameter to 3500ms, I noticed this 15-second segmentation behavior seems to have changed. Indeed I receive the error: "ServiceTimeout, Due to service inactivity the client buffer size exceeded" when I create a long segment without silence pause.

Can you confirm if adjusting the Speech_SegmentationSilenceTimeoutMs parameter impacts the expected 15-second segmentation? If so, can you provide more details on how these settings interact, or point me to the relevant documentation?

I'd appreciate any insights or guidance on this matter. I would like to avoid the error.

Thank you for your assistance.

Best regards,

Domenico.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,440 questions
{count} votes

2 answers

Sort by: Most helpful
  1. santoshkc 5,080 Reputation points Microsoft Vendor
    2023-10-26T14:12:37.2633333+00:00

    Hi @Domenico Zurlo ,

    Following up to see my above "comment" answer helps by checking the comments section of this thread. Do let us know if you have any queries.

    To reiterate the resolution here, let me jot down the gist of my comment answer above.

    I understand that you are adjusted the Speech_SegmentationSilenceTimeoutMs parameter to 3500ms and noticed an error: "ServiceTimeout, Due to service inactivity the client buffer size exceeded", when a long speech is done.

    I ran the below code in my local Jupyter Notebook without any buffer error. The below code uses 3500ms for Speech_SegmentationSilenceTimeoutMs

    # Set up the speech configuration
    speech_key = "<SPEECH-KEY>"
    service_region = "SERVICE-REGION>"
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    
    # Set the Speech_SegmentationSilenceTimeoutMs parameter to 3500ms
    speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "3500")
    
    # Set up the audio configuration
    audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
    
    # Set up the speech recognizer
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    
    # Set the request timeout property
    speech_config.set_service_property(name="requesttimeout", value="20000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)
    
    # Start continuous recognition
    done = False
    
    def stop_cb(evt):
        """Callback that stops continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        global done
        done = True
    
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)
    speech_recognizer.start_continuous_recognition()
    
    while not done:
        pass
    
    # Listen to the recognized event and check the duration property
    for result in speech_recognizer.session_results:
        if result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("Recognized: {}".format(result.text))
            print("Duration: {} seconds".format(result.duration))
        elif result.reason == speechsdk.ResultReason.NoMatch:
            print("No speech could be recognized")
        elif result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = result.cancellation_details
            print("Speech recognition canceled: {}".format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                print("Error details: {}".format(cancellation_details.error_details))
    

    Please have a look at the sample implementation done in the above "comment".

    Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics. Thank you!


  2. Domenico Zurlo 1 Reputation point
    2023-10-27T08:46:01+00:00

    Hi @santoshkc ,

    I use .net pushStream as you can see below. Could be the stream the source of the issue?

                using (var pushStream = AudioInputStream.CreatePushStream())
                {
                    using (var audioConfig = AudioConfig.FromStreamInput(pushStream))
                    {
                        using (var speechClient = new SpeechRecognizer(speechConfig, audioConfig))
                        {