How to stream audio output using python Speech SDK?

Gustavo Jakobi 0 Reputation points
2024-01-17T03:31:41.68+00:00

The GitHub repository for the SDK contains numerous samples, but they do not provide clear guidance on handling audio streams. Specifically, I am currently working with this python sample, and while the examples print the buffer size, they do not demonstrate the proper usage. Here is an excerpt from the sample:

audio_buffer = bytes(32000)
total_size = 0
filled_size = pull_stream.read(audio_buffer)
while filled_size > 0:     
	print("{} bytes received.".format(filled_size))     
	total_size += filled_size     
	filled_size = pull_stream.read(audio_buffer)     
print("Totally {} bytes received.".format(total_size))

Ideally, I would like to receive my audio in chunks and play it as I receive them, rather than waiting for the entire audio to finish. Could you provide guidance on the correct approach for handling audio streams in real-time?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 27,540 Reputation points Microsoft Employee Moderator
    2024-01-23T05:16:49.4+00:00

    @Gustavo Jakobi Thanks for getting back. You can leverage the PushAudioInputStream.Write Method.. This writes the audio data specified by making an internal copy of the data. Note: The dataBuffer must not contain an audio header.

    . Sample code:

    import os
    import azure.cognitiveservices.speech as speechsdk
    import wave
    
    def recognize_from_wav_file(filename):
        # This example requires environment variables named "SPEECH_KEY" and "SPEECH_REGION"
        speech_config = speechsdk.SpeechConfig(subscription="4cd859XXXXXXXXX1b8", region="westeurope")
        speech_config.speech_recognition_language="en-US"
    
        # Open the .wav file
        wf = wave.open(filename, 'rb')
    
        # Set up the audio stream
        push_stream = speechsdk.audio.PushAudioInputStream()
        audio_config = speechsdk.audio.AudioConfig(stream=push_stream)
        speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    
        # Read the .wav file in chunks and feed them to the speech recognizer
        CHUNK = 1024
        data = wf.readframes(CHUNK)
        while len(data) > 0:
            push_stream.write(data)
            data = wf.readframes(CHUNK)
    
        # Close the stream to signal that all audio data has been written
        push_stream.close()
    
        # Recognize the speech from the .wav file
        speech_recognition_result = speech_recognizer.recognize_once_async().get()
    
        if speech_recognition_result.reason == speechsdk.ResultReason.RecognizedSpeech:
            print("Recognized: {}".format(speech_recognition_result.text))
        elif speech_recognition_result.reason == speechsdk.ResultReason.NoMatch:
            print("No speech could be recognized: {}".format(speech_recognition_result.no_match_details))
        elif speech_recognition_result.reason == speechsdk.ResultReason.Canceled:
            cancellation_details = speech_recognition_result.cancellation_details
            print("Speech Recognition canceled: {}".format(cancellation_details.reason))
            if cancellation_details.reason == speechsdk.CancellationReason.Error:
                print("Error details: {}".format(cancellation_details.error_details))
                print("Did you set the speech resource key and region values?")
    
    recognize_from_wav_file('MyAudioFile.wav')
    
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.