How to save audio stream during continuous speech recognition

yme 10 Reputation points
2023-01-23T01:49:18.56+00:00

Hello, I want to perform continuous speech recognition (ASR) on the default microphone while simultaneously writing the audio to a file. In other words, I'm wondering how can I get the audio from continuous ASR and save it to a file? I found this thread, but it was not helpful. I am also aware of custom audio streams but which subclass do I need to override to access the outgoing audio, and how can I save it to e.g. WAV?

I am also wondering how to get the model confidence of a speech recognition result.

Thank you!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,069 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator
    2023-01-23T11:20:46.9766667+00:00

    yme The speech service STT does not offer the capability to download the input speech input while passing the same to the speech service. However, the option available to enable audio logging captures the speech but this is not ideal in your case as this will not be real time. You can set the property SpeechServiceConnection_EnableAudioLogging and get the logs from Get base model endpoint logs.

    speechConfig.SetProperty(PropertyId.SpeechServiceConnection_EnableAudioLogging, "true");
    
    

    Also, to enable capturing the audio as audio stream there is a sample in the SDK repo to use naudio with microphone input as external source and streaming data in push mode to the Speech SDK. This sample is same as the previously answered issue on the SDK repo where it writes the stream as wav file too.

    I am not sure if I interpreted the second part of your question correctly. Could you please clarify with an example?


  2. yme 10 Reputation points
    2023-01-23T11:28:10.8466667+00:00

    Thank you so much! I will look into this solution.

    I was able to answer my second part: you can set a property in the speech config requesting detailed output (below), then the recognizer confidence will be in the returned output.

            speech_config.set_property(
                property_id=speechsdk.PropertyId.SpeechServiceResponse_RequestDetailedResultTrueFalse, 
                value='true')
    
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.