Can I set maximum number of participants to real-time diarization?

Question

Can I set maximum number of participants to real-time diarization?

RES 0

Hi,

I follow the document below and success to distinguish the speaker with audio streaming by ConversationTranscriber Class. (I don't use voice signature so it shows Guest-1, Guest-2...)

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-stt-diarization?tabs=windows&pivots=programming-language-javascript

However, I'm curious about if I can set a maximum number to participants in the SDK(Standard Plan) to improve the accuracy of distinguishing speaker.

Here is my question.

Does the diarization have any maximum number in participants?
Does the number of participants affect the accuracy of diarization?
If number of participants affect the accuracy, can I set a maximum to ConversationTranscriber? (It seems available in batch transcription)

RES 0 Reputation points

2024-10-11T02:19:40.1566667+00:00

Hi @santoshkc

Thank you for the prompt reply and clearing that up!
santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2024-10-14T06:45:11.5933333+00:00

Hi @RES,

I'm glad the explanation helped clarify things for you. And thanks for sharing the feedback, which might be beneficial to other community members reading this thread as solution. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", so I'll convert the previous response to an answer in case you'd like to accept the answer. This will help other users who may have a similar query find the solution more easily.

If you have any more questions or need further assistance, feel free to reach out!
santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2024-10-15T09:34:19.4033333+00:00

Hi @RES,

Did you got any chance to check the above response?

1 answer

Your answer

RES 0 Reputation points

2024-10-11T02:19:40.1566667+00:00

Hi @santoshkc

Thank you for the prompt reply and clearing that up!
santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2024-10-14T06:45:11.5933333+00:00

Hi @RES,

I'm glad the explanation helped clarify things for you. And thanks for sharing the feedback, which might be beneficial to other community members reading this thread as solution. Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", so I'll convert the previous response to an answer in case you'd like to accept the answer. This will help other users who may have a similar query find the solution more easily.

If you have any more questions or need further assistance, feel free to reach out!
santoshkc 15,355 Reputation points Microsoft External Staff Moderator

2024-10-15T09:34:19.4033333+00:00

Hi @RES,

Did you got any chance to check the above response?

Answer 1

Hi @RES,

Thank you for reaching out to Microsoft Q&A forum!

In the Azure Speech SDK for speaker diarization, the speakers are automatically identified by the SDK, and there is no strict limit on the number of speakers. The SDK will dynamically handle the number of speakers based on the conversation.

The number of participants can affect the accuracy of diarization. If there are too many speakers, the system might struggle to accurately distinguish between them, especially if voices are similar or overlap.

In real-time transcription using ConversationTranscriber, there is no fixed limit on the number of speakers. The system automatically identifies and labels speakers dynamically. To set a maximum number of speakers, you would need to use batch transcription, which is not available in real-time transcription.

In Conversation Transcription Multichannel Diarization (Preview) of Real-time conversation transcription multichannel diarization (preview), there is no strict limit on the number of participants. You can manage participants by adding voice signatures for better speaker identification. You can also use the DifferentiateGuestSpeakers option for unknown speakers if voice signatures are not provided.

See: Real-time conversation transcription multichannel diarization (preview).

I hope this helps. Thank you.

Share via

Can I set maximum number of participants to real-time diarization?

1 answer

Your answer