Multiple locales error, REST, Speech to Text, fast transcription API

It is VMS 100 Reputation points
2025-06-03T07:44:39.9433333+00:00

Here's the issue

I use multiple locales as described at https://learn.microsoft.com/en-us/azure/ai-services/speech-service/fast-transcription-create?tabs=multilingual-transcription-on#request-configuration-options

Locales given were "hi-IN, en-IN"

            'locales' => ['hi-IN', 'en-IN'],

& then I get something like, in part of the text: कर रही थी एंड इट वास् सो इंटेंस

I was expecting the "एंड इट वास् सो इंटेंस " to be "and it was so intense"

I guess since this is in preview mode, this behaviour is normal. OR.... Am i missing something here?

Thanks in advance!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,069 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 33,631 Reputation points Volunteer Moderator
    2025-06-03T08:24:53.9833333+00:00

    Hello !

    Thank you for posting on Microsoft Learn.

    What you are seeing it a known limitation of the Azure Speech-to-Text Fast Transcription API when using multiple locales (multilingual transcription).

    What you did configures Azure to expect speech in both Hindi (India) and English (India). The API attempts to auto-detect and transcribe speech from either language within the same audio stream.

    However, in preview mode, this multilingual support can sometimes:

    • Fail to switch languages accurately mid-sentence
    • Phonetically transcribe English words in Devanagari (Hindi script), especially when speakers switch languages rapidly or with an accent

    So, instead of "and it was so intense", you're getting the text you showed.

    This is a phonetic transliteration of English words written in Hindi script not a true language detection and switch.

    I recommend that you set primary language explicitly if the majority is in one at least you avoid unexpected transliteration of English into Hindi script.

    "locale": "en-IN"
    

    If you need to do code-switching you may think about segmenting audio into clearer monolingual chunks if possible or simply use single-locale transcription and then apply post-processing.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.