Do any of the Azure text to speech transcription services support multiple speaker identification from the same mobile phone device

Guillermo Proano 0 Reputation points Microsoft Employee
2024-11-26T19:50:17.1666667+00:00

I have a customer with a scenario where they want to transcribe a conversation among several people in the same room using a single mobile phone. Also, what will replace Speaker recognition feature that will be deprecated in the Speech API on 2025?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,078 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Avinash Devarakonda 610 Reputation points Microsoft External Staff
    2024-11-27T07:52:19.96+00:00

    Hi Guillermo Proano,

    Thanks for reaching out to Microsoft Q&A.

    Currently, Azure Text-to-Speech transcription services do not support multiple speaker identification.

    While Azure Speech-to-Text provides a Real-Time Diarization feature in Azure is capable of distinguishing speakers' voices through single-channel audio in streaming mode. This means it can provide live (real-time) speech-to-text transcription by identifying different speakers as they talk. This feature is particularly useful for live conversations or meetings where it can tag each speaker's contribution in real-time. 

    Kindly go through below documents for reference.

    Real-time diarization quickstart - Speech service - Azure AI services | Microsoft Learn 

    Regarding the deprecation of the Speaker recognition feature in the Speech API, there is currently no direct replacement announced. However, you can prefer other Azure AI Speech capabilities as per your need.

    Thank You.

    0 comments No comments

  2. Shikha Ghildiyal 6,630 Reputation points Microsoft Employee Moderator
    2024-11-27T10:00:11.92+00:00

    Hi @Guillermo Proano,

    Thanks for reaching out to Microsoft Q&A.

    Yes, Azure Speech Service provides a feature called Conversation Transcription that allows you to transcribe meetings and other conversations with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. However, it requires a 7-mic circular multi-microphone array and is only available in the following subscription regions: centralus, eastasia, eastus, westeurope. If you want to transcribe a conversation among several people in the same room using a single mobile phone, you can use the Speech SDK to transcribe conversations. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe conversations. Regarding the Speaker Recognition feature, it will be deprecated in the Speech API on October 15, 2025. However, Azure Speech Service provides Speaker Recognition APIs that you can use to identify and verify the speaker's identity in a conversation. You can use the Speaker Recognition APIs to enroll speakers, identify speakers, and verify the speaker's identity.

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

  3. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.