How to push float32 array to audioInputStream using pullAudioInputStreamCallback

Anonymous
2023-11-10T05:43:06.83+00:00

i have taken realtime audio data and used as float32 array how to convert it to azure AudioStreamInput

import numpy as np
import azure.cognitiveservices.speech as speechsdk

class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
    def __init__(self, audio_array):
        self.audio_array = audio_array
        self.position = 0

    def read(self, buffer, offset, count):
        remaining = len(self.audio_array) - self.position
        to_read = min(remaining, count)
        buffer[:to_read] = self.audio_array[self.position:self.position+to_read]
        self.position += to_read
        return to_read
Tried this but 'NumpyAudioStream' object has no attribute '_handle' error occuring

i have afloat32 array how to create an Azure AudioInputStream
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,782 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,126 Reputation points
    2023-11-11T09:51:28.2633333+00:00

    @Anonymous

    Thanks for reaching out to us, to create an AudioInputStream from a float32 array using the PullAudioInputStreamCallback, you can modify the NumpyAudioStream class as follows:

    import numpy as np
    import azure.cognitiveservices.speech as speechsdk
    class NumpyAudioStream(speechsdk.audio.PullAudioInputStreamCallback):
        def __init__(self, audio_array):
            self.audio_array = audio_array
            self.position = 0
        def read(self, buffer, offset, count):
            remaining = len(self.audio_array) - self.position
            to_read = min(remaining, count)
            buffer[:to_read] = self.audio_array[self.position:self.position+to_read]
            self.position += to_read
            return to_read
    # Create a float32 array
    audio_data = np.random.rand(16000).astype(np.float32)
    # Create an AudioConfig object
    audio_config = speechsdk.audio.AudioConfig(stream=NumpyAudioStream(audio_data))
    # Create an AudioInput object
    audio_input = speechsdk.audio.AudioInput(stream=audio_config)
    

    In this example, we first create a float32 array called audio_data. We then create an AudioConfig object using the NumpyAudioStream class, passing in the audio_data array. Finally, we create an AudioInput object using the AudioConfig object.

    Note that the NumpyAudioStream class is defined as a subclass of PullAudioInputStreamCallback, which is used to provide a callback function that is called by the AudioConfig object to read audio data from the stream. The read method of the NumpyAudioStream class reads data from the audio_array attribute and writes it to the buffer argument, which is then passed to the AudioConfig object.

    As your follow up question, the audio is not correctly pushing the stream or its not recognizing the audio and it's cancelling the start_transcription_async call, this sounds like you are encountering issues with real-time audio transcription using Azure Speech to Text. Here are a few things you can try to troubleshoot the issue:

    Check the audio quality: Make sure that the audio being input to the stream is of good quality and is not too noisy or distorted. Poor audio quality can affect the accuracy of the transcription.

    Check the audio format: Make sure that the audio being input to the stream is in a format that is supported by Azure Speech to Text. You can refer to the documentation to see the list of supported audio formats.

    Check the stream configuration: Make sure that the stream is configured correctly and is being pushed to Azure Speech to Text in the correct format. You can refer to the documentation or examples provided by Azure to ensure that your stream is configured correctly.

    Check the transcription settings: Make sure that the transcription settings are configured correctly and are appropriate for the audio being transcribed. You can refer to the documentation to see the list of available transcription settings and their descriptions.

    Contact Azure support: If you continue to experience issues, you may want to reach out to the support team for Azure Speech to Text for further assistance. They may be able to provide additional guidance or help you troubleshoot the issue.

    I hope this helps! Let me know if you have any further questions or if there's anything else I can assist you with.

    Regards,

    Yutong

    -Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.