Do any of the Azure text to speech transcription services support multiple speaker identification from the same mobile phone device

Question

Do any of the Azure text to speech transcription services support multiple speaker identification from the same mobile phone device

Guillermo Proano 0 Microsoft Employee

I have a customer with a scenario where they want to transcribe a conversation among several people in the same room using a single mobile phone. Also, what will replace Speaker recognition feature that will be deprecated in the Speech API on 2025?

Avinash Devarakonda 610 Reputation points Microsoft External Staff

2024-11-28T00:44:44.4233333+00:00

Hi Guillermo Proano,

Following up to see if the given response was helpful.

Thank You.
Avinash Devarakonda 610 Reputation points Microsoft External Staff

2024-11-29T08:49:25.7566667+00:00

Hi Guillermo Proano,

We haven’t heard from you on the last response and was just checking back to see if the give response was helpful.

Thank You.

3 answers

Your answer

Avinash Devarakonda 610 Reputation points Microsoft External Staff

2024-11-28T00:44:44.4233333+00:00

Hi Guillermo Proano,

Following up to see if the given response was helpful.

Thank You.
Avinash Devarakonda 610 Reputation points Microsoft External Staff

2024-11-29T08:49:25.7566667+00:00

Hi Guillermo Proano,

We haven’t heard from you on the last response and was just checking back to see if the give response was helpful.

Thank You.

Answer 1

Hi Guillermo Proano,

Thanks for reaching out to Microsoft Q&A.

Currently, Azure Text-to-Speech transcription services do not support multiple speaker identification.

While Azure Speech-to-Text provides a Real-Time Diarization feature in Azure is capable of distinguishing speakers' voices through single-channel audio in streaming mode. This means it can provide live (real-time) speech-to-text transcription by identifying different speakers as they talk. This feature is particularly useful for live conversations or meetings where it can tag each speaker's contribution in real-time.

Kindly go through below documents for reference.

Real-time diarization quickstart - Speech service - Azure AI services | Microsoft Learn

Regarding the deprecation of the Speaker recognition feature in the Speech API, there is currently no direct replacement announced. However, you can prefer other Azure AI Speech capabilities as per your need.

Thank You.

Answer 2

Hi @Guillermo Proano,

Thanks for reaching out to Microsoft Q&A.

Yes, Azure Speech Service provides a feature called Conversation Transcription that allows you to transcribe meetings and other conversations with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. However, it requires a 7-mic circular multi-microphone array and is only available in the following subscription regions: centralus, eastasia, eastus, westeurope. If you want to transcribe a conversation among several people in the same room using a single mobile phone, you can use the Speech SDK to transcribe conversations. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe conversations. Regarding the Speaker Recognition feature, it will be deprecated in the Speech API on October 15, 2025. However, Azure Speech Service provides Speaker Recognition APIs that you can use to identify and verify the speaker's identity in a conversation. You can use the Speaker Recognition APIs to enroll speakers, identify speakers, and verify the speaker's identity.

Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Answer 3

Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Share via

Do any of the Azure text to speech transcription services support multiple speaker identification from the same mobile phone device

3 answers

Your answer