Why does Azure Speech-to-Text detect French accurately in a standalone Python script but perform poorly in a real-time video call integration?

Suha Mansuri 0 Reputation points
2024-11-14T18:29:27.4866667+00:00

I'm working on a real-time translation project using Azure Speech Services. When I run my translation code in a standalone Python script, it accurately recognizes and translates French and English speech. However, when the same Speech-to-Text functionality is integrated into a video call (using WebSocket connections), the recognition of French is significantly less accurate.

Here’s a summary of my setup:

  • Python Script: I use Azure's cognitive services for real-time speech recognition, and the language detection works very well, especially for French.
  • Video Call Integration: Using Azure Speech Services in a Node.js application, I use the same language configurations and WebSocket to capture and process audio from live video calls, but the French detection is consistently inaccurate.

I’ve ensured that the audio quality is similar in both cases and that the language configurations match. Unless there is an underlying issue somewhere else, the model recognizes english (not amazingly but the language synthesis is there), and does not process french well at all. I have also tried with italian and spanish and they are not great. Is there a language code issue since I am using the speech to text translate and text to speech? l

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,819 questions
Azure Translator
Azure Translator
An Azure service to easily conduct machine translation with a simple REST API call.
427 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.