How to perform speaker diarization on audio files longer than 240 minutes with the Azure Speech Service?

Question

I have several very long audio files that represent entire meetings. I need to transcribe them with speaker diarization, but I seem to be hitting the limits of the Speech resource - namely 240 minutes per audio file. In fact, the transcription often breaks off before the 240 minutes are reached.
I have no problem splitting the audio files into shorter ones, however the diarization will no longer work correctly for audio files after the first one, since the speaker diarization has to begin all over again from speaker 1.

Is there are way to do the diarization on the subsequent audio files while using the same database/vector store created for the first audio file?

Accepted Answer

Hello @Lyubomira Dimitrova , Thank you again for your time and patience throughout this issue.

Looks like this is the current limitation. Currently, only 240minutes per audio file is supported. https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-quotas-and-limits.

You are correct since splitting the files would cause you to lose who is the speaker, the speaker diarization result may not be accurate in this case.

We have shared your feedback to the product team and there are aware of this. But we don't have any ETA on pushing this limit at the time.

I hope you understand.

Regards,

Vasavi

Please remember to "Accept Answer" if any answer/reply helped, so that others in the community facing similar issues can easily find the solution.

Share via

How to perform speaker diarization on audio files longer than 240 minutes with the Azure Speech Service?

0 additional answers