Hello !
Thank you for posting on Microsoft Learn.
What you are seeing it a known limitation of the Azure Speech-to-Text Fast Transcription API when using multiple locales (multilingual transcription).
What you did configures Azure to expect speech in both Hindi (India) and English (India). The API attempts to auto-detect and transcribe speech from either language within the same audio stream.
However, in preview mode, this multilingual support can sometimes:
- Fail to switch languages accurately mid-sentence
- Phonetically transcribe English words in Devanagari (Hindi script), especially when speakers switch languages rapidly or with an accent
So, instead of "and it was so intense", you're getting the text you showed.
This is a phonetic transliteration of English words written in Hindi script not a true language detection and switch.
I recommend that you set primary language explicitly if the majority is in one at least you avoid unexpected transliteration of English into Hindi script.
"locale": "en-IN"
If you need to do code-switching you may think about segmenting audio into clearer monolingual chunks if possible or simply use single-locale transcription and then apply post-processing.