Issue with gpt-4o-transcribe detecting wrong language (Chinese/Malay/Tamil/English mix-up)

Rosemary Raphael 5 Reputation points
2025-10-13T05:10:25.12+00:00

We are using gpt-4o-transcribe for our voice project. It was working fine previously, but for the past week our customers have been facing issues with the voice model. When they speak in English, it’s being transcribed as Chinese and when they speak in Chinese, it’s being interpreted as Malay and so on. We haven’t made any changes on our end, so we would like to know if there have been any recent updates or changes to the gpt-4o-transcribe model that could explain this behavior.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Sridhar M 2,690 Reputation points Microsoft External Staff Moderator
    2025-10-13T11:17:58.57+00:00

    Hi Rosemary Raphael,

    Thank you for reaching out on the Microsoft Q&A.

    1. Model Version Retirement & Transition: The version of gpt-4o-transcribe released in March 2025 (2025-03-20) was scheduled for retirement on October 15, 2025. Customers using that version are now being forced to upgrade to a newer build. This transition appears to have introduced instability in language detection.
    2. Known Bug with Language Enforcement: There is an acknowledged bug where gpt-4o-transcribe ignores or inconsistently applies the language parameter. Even when you specify "language": "en", the model sometimes switches to other languages (Chinese, Malay, etc.). This is because the language hint is treated as a “soft preference,” not a strict rule.
    3. Community Reports of Wrong Language Output : Multiple developers have reported that since early October, English audio is being transcribed as Chinese or Malay, and Chinese as other languages. This aligns with your timeline.
    4. Realtime API Changes: A new variant gpt-4o-transcribe-latest was introduced in the Realtime API, but it’s not fully stable yet. Some users report missing or misaligned transcription events when using the new GA session https://community.openai.com/t/realtime-transcription-mismatch-and-gpt-4o-transcribe-latest/1358789
      The retirement of the old version and rollout of a new backend likely changed the language detection pipeline.
    5. The model is multilingual by design, and without strict enforcement, it guesses based on acoustic cues. If your audio has background noise or mixed-language phrases, the bug amplifies misclassification. You Can Do Now 1.Check Your Deployment Version In Azure Portal → OpenAI Deployments, confirm if you’re still on 2025-03-20. If yes, you need to update to the latest version once it appears in the dropdown.
      1. Force Language via Prompting Add a strong instruction in the prompt field:
              "prompt": "The audio is entirely in English. Transcribe only in English."
        
        This helps, but does not fully fix the bug.
      2. Set Temperature Low Use temperature: 0.2 and avoid chunking. Some developers report better consistency with these settings. [community.openai.com]
      3. Fallback Option If accuracy is critical, consider temporarily switching to Whisper-1 for strict language handling until the new gpt-4o-transcribe build stabilizes.

    References:https://learn.microsoft.com/en-us/azure/ai-foundry/openai/whats-new

    https://learn.microsoft.com/en-us/azure/ai-services/translator/reference/known-issues https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-identification?tabs=once&pivots=programming-language-csharp

    Feel free to accept this as an answer.

    Thank you


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.