How to use Azure Speech to Text API to convert real-time audio from a web browser to text with low latency and high accuracy?

Question

How to use Azure Speech to Text API to convert real-time audio from a web browser to text with low latency and high accuracy?

CodeKidz 25

Is there a way to record audio from a user's web browser and stream it to Azure Speech to Text API through a backend server, then stream back the converted text in real-time? We currently use the default SDK in the browser for speech-to-text, but it produces inaccurate results. Are there any better solutions or resources available?

navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2023-11-15T06:44:03.76+00:00

@CodeKidz Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

I understand that you would like to know if Azure Speech to Text API can be used to convert real-time audio from a web browser to text with low latency and high accuracy.

This article talks about creating and running an application to recognize and transcribe speech to text in real-time using NodeJs app for your browser*.*****
.
Please note:** Recognizing speech from a microphone is not supported in Node.js. It's supported only in a browser-based JavaScript environment. For more information, see the React sample and the implementation of speech to text from a microphone on GitHub.

.
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.
.
**
Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.
CodeKidz 25 Reputation points

2023-11-15T06:53:32.53+00:00

Is there a demo page to experience the delay and accuracy effect? I couldn't find a place to try it out in the project.

Accepted answer

0 additional answers

Your answer

navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2023-11-15T06:44:03.76+00:00

@CodeKidz Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

I understand that you would like to know if Azure Speech to Text API can be used to convert real-time audio from a web browser to text with low latency and high accuracy.

This article talks about creating and running an application to recognize and transcribe speech to text in real-time using NodeJs app for your browser*.*****
.
Please note:** Recognizing speech from a microphone is not supported in Node.js. It's supported only in a browser-based JavaScript environment. For more information, see the React sample and the implementation of speech to text from a microphone on GitHub.

.
Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.
.
**
Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.
CodeKidz 25 Reputation points

2023-11-15T06:53:32.53+00:00

Is there a demo page to experience the delay and accuracy effect? I couldn't find a place to try it out in the project.

Answer 1

@CodeKidz Thanks for getting back. Please try Speech Studio to see how phrase list can improve recognition accuracy. Go to Real-time Speech to text in Speech Studio.

You test speech recognition by uploading an audio file or recording audio with a microphone. For example, select record audio with a microphone and then say "Hi Rehaan, this is Jessie from Contoso bank. " Then select the red button to stop recording.
You should see the transcription result in the Test results text box. If "Rehaan", "Jessie", or "Contoso" were recognized incorrectly, you can add the terms to a phrase list in the next step.
Select Show advanced options and turn on Phrase list.
Enter "Contoso;Jessie;Rehaan" in the phrase list text box. Note that multiple phrases need to be separated by a semicolon.

Use the microphone to test recognition again. Otherwise you can select the retry arrow next to your audio file to re-run your audio. The terms "Rehaan", "Jessie", or "Contoso" should be recognized.

On a side note, for improving the accuracy of the speech-to-text results, you might want to consider the following:

Improve speech-to-text accuracy with Azure Custom Speech: This blog post from Microsoft Azure discusses how to improve speech-to-text accuracy with Azure Custom Speech. It explains how to train a custom speech model on top of a base model to improve recognition of domain-specific vocabulary and specific audio conditions.
Improve recognition accuracy with phrase list - Azure AI services: This Microsoft Learn document explains how to use a phrase list to improve recognition accuracy. A phrase list is a list of words or phrases provided ahead of time to help improve their recognition.

Hope this helps.

**
Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

navba-MSFT 27,540 Reputation points Microsoft Employee Moderator

2023-11-16T04:24:40.07+00:00

@CodeKidz Just following up to check if the below answer helped. If that answers your query, do click "Accept the answer” for the same, which might be beneficial to other community members reading this thread. And, if you have any further query do let us know. I would be happy to help.
pfgazure 0 Reputation points

2024-01-02T19:25:36.15+00:00

Is there an example of using the microphone in a web app for speech recognition? The console app is only so useful. Thanks.
CodeKidz 25 Reputation points

2024-01-03T01:44:16.7466667+00:00

Try this repo https://github.com/Azure-Samples/AzureSpeechReactSample

You can find everything you want there, much better than the samples privided by the official document.

Share via

How to use Azure Speech to Text API to convert real-time audio from a web browser to text with low latency and high accuracy?

0 additional answers

Your answer