Improving Accuracy of Azure Speech-to-Text with Continuous Language Identification

santoshkc 8,940 Reputation points Microsoft Vendor
2024-07-31T11:33:07.75+00:00

How can I improve the accuracy of language identification and speech-to-text (STT) capabilities in Azure Speech Service for my voice bot, which is experiencing issues with detecting English language and picking up background noise?

PS - Based on common issues that we have seen from customers and other sources, we are posting these questions to help the Azure community.

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,735 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. santoshkc 8,940 Reputation points Microsoft Vendor
    2024-07-31T11:37:04.93+00:00

    Greetings!

    To address the issues with Azure Speech-to-Text (STT) accuracy and language identification, follow these recommendations:

    1. Language Detection Issue: If the user's language is being incorrectly detected as Hindi instead of English, ensure you specify multiple locales in your service configuration. This allows the service to detect and recognize the correct language based on the specific part of the audio. Refer to the Azure documentation on Implementing Language Identification.
    2. Background Noise Issue: If background noises are being recognized while the user is not speaking, consider using a custom speech model. Custom speech models can improve performance by reducing sensitivity to background noise and enhancing recognition accuracy for different speaking styles and accents. More information can be found in the Custom Speech Overview.
    3. Audio Format: Uncompressed audio formats generally yield higher quality and more accurate speech recognition results. For best practices and limitations, refer to the Azure Speech Service Transparency Note.
    4. Asynchronous Processing: For asynchronous audio file transcription, Azure Speech-to-Text offers Batch Transcription, which is ideal for large volumes of audio data. Note that Batch Transcription is supported in the Speech CLI and the Speech-to-Text API, but not in the Speech Python SDK. Further details can be found in the Batch Transcription Overview.

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    Please do not forget to "up-vote" wherever the information provided helps you, as this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.