How to save audio stream during continuous speech recognition

Question

How to save audio stream during continuous speech recognition

yme 10

Hello, I want to perform continuous speech recognition (ASR) on the default microphone while simultaneously writing the audio to a file. In other words, I'm wondering how can I get the audio from continuous ASR and save it to a file? I found this thread, but it was not helpful. I am also aware of custom audio streams but which subclass do I need to override to access the outgoing audio, and how can I save it to e.g. WAV?

I am also wondering how to get the model confidence of a speech recognition result.

Thank you!

2 answers

Your answer

Answer 1

romungi-MSFT 48,911 Microsoft Employee Moderator

yme The speech service STT does not offer the capability to download the input speech input while passing the same to the speech service. However, the option available to enable audio logging captures the speech but this is not ideal in your case as this will not be real time. You can set the property SpeechServiceConnection_EnableAudioLogging and get the logs from Get base model endpoint logs.

speechConfig.SetProperty(PropertyId.SpeechServiceConnection_EnableAudioLogging, "true");

Also, to enable capturing the audio as audio stream there is a sample in the SDK repo to use naudio with microphone input as external source and streaming data in push mode to the Speech SDK. This sample is same as the previously answered issue on the SDK repo where it writes the stream as wav file too.

I am not sure if I interpreted the second part of your question correctly. Could you please clarify with an example?

yme 10 Reputation points

2023-01-23T11:30:47.8266667+00:00

One quick clarification check - by downloading the audio log, this is the same as being able to have the speech file after the ASR is complete (i.e. asynchronously), just not being able to write it in real time?
romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2023-01-23T12:58:44.07+00:00

Yes, that is correct.

Answer 2

yme 10

Thank you so much! I will look into this solution.

I was able to answer my second part: you can set a property in the speech config requesting detailed output (below), then the recognizer confidence will be in the returned output.

        speech_config.set_property(
            property_id=speechsdk.PropertyId.SpeechServiceResponse_RequestDetailedResultTrueFalse, 
            value='true')

Share via

How to save audio stream during continuous speech recognition

2 answers

Your answer