Types of speech API services

Article
06/01/2023

You can use the Azure Cognitive Services Speech service to perform spoken language transformations, including speech-to-text, text-to-speech, speech translation, and speaker recognition.

Note

Use Azure Cognitive Service for Language if you want to gather insights on terms or phrases or get detailed contextual analysis of spoken or written language.

Services

Speech-to-text can convert audio streams to text in real time or in batch.
Text-to-speech enables applications to convert text to human-like speech.
Speech translation provides multi-language speech-to-speech and speech-to-text translation of audio streams.

How to choose a speech service

This flow chart can help you choose the speech service that suits your needs:

Diagram that shows how to choose a speech service.

The left side of the diagram illustrates audio-to-audio or audio-to-text processes.

Speech-to-text is used to convert speech from an audio source to a text format.
Speech-to-speech is used to translate speech in one language to speech in another language.

The right side of the diagram illustrates text-to-audio processes.

Text-to-speech is used to generate spoken audio from a text source.

Common use cases

The following table recommends services for some common use cases.

Use case	Service to use
Provide closed captions for recorded or live videos	Speech-to-text
Create a transcript of a phone call or meeting	Speech-to-text
Implement automated note dictation	Speech-to-text
Determine intended user input for further processing	Speech-to-text
Generate spoken responses to user input	Text-to-speech
Create voice menus for telephone systems	Text-to-speech
Read email or text messages aloud in hands-free scenarios	Text-to-speech
Broadcast announcements in public locations, like railway stations or airports	Text-to-speech
Produce real-time closed captioning for a speech or simultaneous two-way translation of a spoken conversation	Speech-to-text

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Kruti Mehta | Azure Senior Fast-Track Engineer
Oscar Shimabukuro | Senior Cloud Solution Architect

Other contributors:

Mick Alberts | Technical Writer
Ashish Chahuan | Senior Cloud Solution Architect
Brandon Cowen | Senior Cloud Solution Architect
Manjit Singh | Software Engineer
Christina Skarpathiotaki | Senior Cloud Solution Architect
Nathan Widdup | Azure Senior Fast-Track Engineer

To see nonpublic LinkedIn profiles, sign in to LinkedIn.

Types of speech API services

Services

How to choose a speech service

Common use cases

Contributors

Next steps

Feedback

Feedback

Additional resources

Types of speech API services

Services

How to choose a speech service

Common use cases

Contributors

Next steps

Related resources

Feedback

Feedback

Additional resources