How to send a in-memory wav file to Speech-To-text

Question

How to send a in-memory wav file to Speech-To-text

Larissa de Araújo Barros 0

I've been trying to get the speech to text to work on a .NET API. I am receiving a wav file via IFormFile from a API and then sending it to Azure Speech, using the Cognitive Services SDK. It is returning an error when i send the file: "Result = {ResultId:Reason:NoMatch Recognized text:<>.

Json:{"Id":"","RecognitionStatus":"InitialSilenceTimeout",

"DisplayText":"","Offset":15600000,"Duration":75100000,"Channel":0}}"
The problem is that if I save this file to a local directory and then read it, creating a "

using var audioConfig = AudioConfig.FromWavFileInput(newFile);
                using var recognizer = new SpeechRecognizer(speechConfig, audioConfig);

" as it shows on docs https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-recognize-speech?pivots=programming-language-csharp it works, but how can i make this work without saving the file locally?
I tried this:

private async void ReadFromStream(SpeechConfig speechConfig, IFormFile audioFile)
        {
            speechConfig.SetProperty(PropertyId.SpeechServiceConnection_InitialSilenceTimeoutMs, "10000");

            using var audioFormat = AudioStreamFormat.GetWaveFormatPCM(16000, 16, 1);
            using (var audioConfigStream = new PushAudioInputStream(audioFormat))
            {
                using (var audioConfig = AudioConfig.FromStreamInput(audioConfigStream))
                {

                    using (var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig))
                    {
                        audioFile.Headers.Clear();

                        var bytes = ConvertToByteArrayContent(audioFile);

                        audioConfigStream.Write(bytes);
                        audioConfigStream.Close();

                        var speechRecognitionResult = speechRecognizer.RecognizeOnceAsync().Result;
                        OutputSpeechRecognitionResult(speechRecognitionResult);

                    }
                }
            }

        }

 private byte[] ConvertToByteArrayContent(IFormFile audiofile)
        {
            byte[] data;

            using (var br = new BinaryReader(audiofile.OpenReadStream()))
            {
                data = br.ReadBytes((int)audiofile.OpenReadStream().Length);
            }

            return data;
        }

and many other approaches.

Viorel 122.6K Reputation points

2023-09-05T19:19:20.0266667+00:00

Maybe ConvertToByteArrayContent has an issue. Did you check its output (writing the bytes to disk, for example)?
Larissa de Araújo Barros 0 Reputation points

2023-09-05T19:27:54.92+00:00

Sorry, i added the ConvertToByteArrayContent code to the question. And yes, it works if I call it to get the bytes and write to disk
Viorel 122.6K Reputation points

2023-09-05T19:44:47.4033333+00:00

Did you try the sample code with less differences? Probably the essential adjustment will be: var reader = new BinaryReader(audioFile.OpenReadStream())
Larissa de Araújo Barros 0 Reputation points

2023-09-05T20:02:35.08+00:00

Yes, I also tried the sample code from the "Recognize speech from an in-memory stream" section
Same error:
"ResultId:Reason:NoMatch Recognized text:<>.

Json:{"Id":"","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":90700000,"Channel":0}}"
I'm using the same audio file that works if I read from disk

Your answer

Viorel 122.6K Reputation points

2023-09-05T19:19:20.0266667+00:00

Maybe ConvertToByteArrayContent has an issue. Did you check its output (writing the bytes to disk, for example)?
Larissa de Araújo Barros 0 Reputation points

2023-09-05T19:27:54.92+00:00

Sorry, i added the ConvertToByteArrayContent code to the question. And yes, it works if I call it to get the bytes and write to disk
Viorel 122.6K Reputation points

2023-09-05T19:44:47.4033333+00:00

Did you try the sample code with less differences? Probably the essential adjustment will be: var reader = new BinaryReader(audioFile.OpenReadStream())
Larissa de Araújo Barros 0 Reputation points

2023-09-05T20:02:35.08+00:00

Yes, I also tried the sample code from the "Recognize speech from an in-memory stream" section
Same error:
"ResultId:Reason:NoMatch Recognized text:<>.

Json:{"Id":"","RecognitionStatus":"InitialSilenceTimeout","Offset":0,"Duration":90700000,"Channel":0}}"
I'm using the same audio file that works if I read from disk

Share via

How to send a in-memory wav file to Speech-To-text

Your answer