How to use Azure Speech to Text API to convert real-time audio from a web browser to text with low latency and high accuracy?

CodeKidz 25 Reputation points
2023-11-14T12:50:33.32+00:00

Is there a way to record audio from a user's web browser and stream it to Azure Speech to Text API through a backend server, then stream back the converted text in real-time? We currently use the default SDK in the browser for speech-to-text, but it produces inaccurate results. Are there any better solutions or resources available?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
Azure
Azure
A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.
1,408 questions
{count} votes

Accepted answer
  1. navba-MSFT 27,540 Reputation points Microsoft Employee Moderator
    2023-11-15T07:02:35.4+00:00

    @CodeKidz Thanks for getting back. Please try Speech Studio to see how phrase list can improve recognition accuracy. Go to Real-time Speech to text in Speech Studio.

    1. You test speech recognition by uploading an audio file or recording audio with a microphone. For example, select record audio with a microphone and then say "Hi Rehaan, this is Jessie from Contoso bank. " Then select the red button to stop recording.
    2. You should see the transcription result in the Test results text box. If "Rehaan", "Jessie", or "Contoso" were recognized incorrectly, you can add the terms to a phrase list in the next step.
    3. Select Show advanced options and turn on Phrase list.
    4. Enter "Contoso;Jessie;Rehaan" in the phrase list text box. Note that multiple phrases need to be separated by a semicolon.
      User's image

    Use the microphone to test recognition again. Otherwise you can select the retry arrow next to your audio file to re-run your audio. The terms "Rehaan", "Jessie", or "Contoso" should be recognized.

    On a side note, for improving the accuracy of the speech-to-text results, you might want to consider the following:

    1. Improve speech-to-text accuracy with Azure Custom Speech: This blog post from Microsoft Azure discusses how to improve speech-to-text accuracy with Azure Custom Speech. It explains how to train a custom speech model on top of a base model to improve recognition of domain-specific vocabulary and specific audio conditions.
    2. Improve recognition accuracy with phrase list - Azure AI services: This Microsoft Learn document explains how to use a phrase list to improve recognition accuracy. A phrase list is a list of words or phrases provided ahead of time to help improve their recognition.

    Hope this helps.

    **
    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.