Azure speech to text with Diarization receives stop event in case of long pause in audio file

Harish A 55 Reputation points
2023-11-01T07:11:13.22+00:00

Hi

I am using continuous reading of a wav file and sending in frames to Azure cognitive services API to get the text. I use with Diarization. (

speechsdk.transcription.ConversationTranscriber

)

Everything works fine when audio file has speech in it, that means people speak continously. Otherwise, if the is a long pause, that is people dont speak for longer time, or there is a music in it. Session Stop event gets fired and my program terminates.

Example: Let us take a scenario, where there is a meeting for long 4 hours with short breaks of 15-30 mins in between. Now if I try to use this audio, when ever there is a break for 15 mins, "Session Stop" event gets triggered and my python script ends.

Is there a way to handle this. Obviously, what I am looking for is, irrespective of there is speech in audio, I should not receive Session Stop or Cancelled event.

Is there any such property that I can set?

I tried properties like "Conversation_Initial_Silence_Timeout", but when I read through the documentation, I dont think this serves my purpose.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,669 questions
{count} votes

Accepted answer
  1. dupammi 8,615 Reputation points Microsoft External Staff
    2023-11-01T12:05:20.7166667+00:00

    Hi @Harish A ,

    Thank you for using the Microsoft Q&A.

    I can understand that you were looking for a method that can allow the maximum duration of silence, before the conversation is considered complete. I will be happy to assist you with this.

    You can use the speech_config.set_service_property() method to set any one of the below 2 parameters:

    conversationEndSilenceTimeoutMs

    (OR)

    Speech_SegmentationSilenceTimeoutMs

    By setting any one of these 2 properties, you can handle silence and avoid the "session stop" event from getting triggered. Then the code can handle the conversation ending detection timeout.

    # Set conversation ending detection timeout (4 hours in seconds)
    conversation_ending_detection_timeout = 4 * 60 * 60
        speech_config.set_service_property("conversationEndSilenceTimeoutMs", str(conversation_ending_detection_timeout * 1000), speechsdk.ServicePropertyChannel.UriQueryParameter)
    

    (OR)

    # Set conversation ending detection timeout (4 hours in seconds)     conversation_ending_detection_timeout = 4 * 60 * 60     
        speech_config.set_service_property("speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs", str(conversation_ending_detection_timeout * 1000), speechsdk.ServicePropertyChannel.UriQueryParameter)
    

    Here is the link, where you can find more details.
    How to recognize speech - Speech service - Azure AI services | Microsoft Learn

    Hope this helps.

    Thank you!


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.