Can I set maximum number of participants to real-time diarization?

RES 0 Reputation points
2024-10-10T08:27:10.33+00:00

Hi,

I follow the document below and success to distinguish the speaker with audio streaming by ConversationTranscriber Class. (I don't use voice signature so it shows Guest-1, Guest-2...)

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization?tabs=windows&pivots=programming-language-javascript

However, I'm curious about if I can set a maximum number to participants in the SDK(Standard Plan) to improve the accuracy of distinguishing speaker.

Here is my question.

  1. Does the diarization have any maximum number in participants?
  2. Does the number of participants affect the accuracy of diarization?
  3. If number of participants affect the accuracy, can I set a maximum to ConversationTranscriber? (It seems available in batch transcription)
Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,070 questions
{count} votes

1 answer

Sort by: Most helpful
  1. santoshkc 15,355 Reputation points Microsoft External Staff Moderator
    2024-10-10T12:37:35.25+00:00

    Hi @RES,

    Thank you for reaching out to Microsoft Q&A forum!

    In the Azure Speech SDK for speaker diarization, the speakers are automatically identified by the SDK, and there is no strict limit on the number of speakers. The SDK will dynamically handle the number of speakers based on the conversation.

    The number of participants can affect the accuracy of diarization. If there are too many speakers, the system might struggle to accurately distinguish between them, especially if voices are similar or overlap.

    In real-time transcription using ConversationTranscriber, there is no fixed limit on the number of speakers. The system automatically identifies and labels speakers dynamically. To set a maximum number of speakers, you would need to use batch transcription, which is not available in real-time transcription.

    In Conversation Transcription Multichannel Diarization (Preview) of Real-time conversation transcription multichannel diarization (preview), there is no strict limit on the number of participants. You can manage participants by adding voice signatures for better speaker identification. You can also use the DifferentiateGuestSpeakers option for unknown speakers if voice signatures are not provided.

    See: Real-time conversation transcription multichannel diarization (preview).

    I hope this helps. Thank you.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.