Seeking Optimal Speech Transcription Service for Mixed Chinese and English Scenarios

hexarrior 40 Reputation points
2024-07-10T10:19:21.5833333+00:00

Our speech recognition scenario mainly involves a mix of Chinese and English. Currently, we have chosen the Chinese language recognition type (as there is no specific type for mixed Chinese and English). Besides manually adding hotwords and conducting plain-text training, is there a more suitable speech transcription service for a mixed Chinese and English scenario?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,713 questions
0 comments No comments
{count} votes

Accepted answer
  1. navba-MSFT 24,175 Reputation points Microsoft Employee
    2024-07-10T10:54:56.56+00:00

    @hexarrior Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    .

    Automatic multi-lingual speech translation is available in public preview. This innovative feature revolutionizes the way language barriers are overcome, offering unparalleled capabilities for seamless communication across diverse linguistic landscapes.

    More info is available here.

    Key Highlights

    • Unspecified input language: Multi-lingual speech translation can receive audio in a wide range of languages, and there's no need to specify what the expected input language is. It makes it an invaluable feature to understand and collaborate across global contexts without the need for presetting.
    • Language switching: Multi-lingual speech translation allows for multiple languages to be spoken during the same session, and have them all translated into the same target language. There's no need to restart a session when the input language changes or any other actions by you.

    .

    .

    How to access ?

    Refer to the code samples at how to translate speech. This new feature is fully supported by all SDK versions from 1.37.0 onwards.

    .

    .

    Batch transcription provides models with new architecture for these locales: es-ES, es-MX, fr-FR, it-IT, ja-JP, ko-KR, pt-BR, and zh-CN. These models significantly enhance readability and entity recognition.

    .

    For multi-lingual speech translation, these are the languages the Speech service can automatically detect and switch between from the input: Arabic (ar), Basque (eu), Bosnian (bs), Bulgarian (bg), Chinese Simplified (zh), Chinese Traditional (zhh), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), Galician (gl), German (de), Greek (el), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Macedonian (mk), Norwegian (nb), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi), and Welsh (cy).

    For a list of the supported output (target) languages, see the Translate to text language table in the language and voice support documentation.

    .

    .

    Hope this helps. If you have any follow-up questions, please let me know. I would be happy to help.

    **

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.