Asynchronous Conversation Transcription for mono audio files

Question

Asynchronous Conversation Transcription for mono audio files

Marcus Ma 1

This is a similar post to this one, but that was back in 2020 so I decided to make a new post. Here's my scenario:

I have a collection of mono wav files of research interviews, and I have been using Azure's speech-to-text to transcribe them. As such, I am using asynchronous file uploads. Azure's conversation transcription feature seems useful to me, mainly for its speaker diarization ability. The azure docs still indicate that 8-channel audio is needed for async conversation transcription, but is there a workaround for mono wav files?

The previous post listed that there was a private workaround, but unfortunately this will not work for me, as I will be working with my clients' individual Azure accounts so any functionality would need to be publicly accessible. I am working in JavaScript and if there any code demos then that would be greatly appreciated as well.

Also, the pricing website mentions that multichannel conversation transcription is a higher price than standard transcription ($2.10 vs $1.00 per hour), so if there is a single channel conversation transcription feature, which price categorization would this fall under?

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2022-03-17T06:52:25.677+00:00

@Marcus Ma Are you using batch transcription with speaker separation(diarization) for your mono audio files?
If Yes, then this is the only GA option available to transcribe text for mono audio files.

The conversation transcription uses mono files only for creating speaker's voice signature but the audio input for conversation transcription does not support mono files.
Marcus Ma 1 Reputation point

2022-03-17T15:09:34.967+00:00

Ok, the only thing I actually want to do is speaker diarization. I haven't heard about the batch transcription with speaker separation, could you tell me more about that or show me a code demo? As long as it can work asychronously on uploaded mono wav files, that's all I need.
Marcus Ma 1 Reputation point

2022-03-21T00:18:06.22+00:00

@romungi-MSFT ^^

1 answer

Your answer

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2022-03-17T06:52:25.677+00:00

@Marcus Ma Are you using batch transcription with speaker separation(diarization) for your mono audio files?
If Yes, then this is the only GA option available to transcribe text for mono audio files.

The conversation transcription uses mono files only for creating speaker's voice signature but the audio input for conversation transcription does not support mono files.
Marcus Ma 1 Reputation point

2022-03-17T15:09:34.967+00:00

Ok, the only thing I actually want to do is speaker diarization. I haven't heard about the batch transcription with speaker separation, could you tell me more about that or show me a code demo? As long as it can work asychronously on uploaded mono wav files, that's all I need.
Marcus Ma 1 Reputation point

2022-03-21T00:18:06.22+00:00

@romungi-MSFT ^^

Answer 1

romungi-MSFT 48,911 Microsoft Employee Moderator

@Marcus Ma The following is the documentation for batch transcription API.
The samples to use the same using REST API are available in the SDK github repo.

If an answer is helpful, please click on or upvote which might help other community members reading this thread.

Share via

Asynchronous Conversation Transcription for mono audio files

1 answer

Your answer