Asynchronous Conversation Transcription for mono audio files

Marcus Ma 1 Reputation point

This is a similar post to this one, but that was back in 2020 so I decided to make a new post. Here's my scenario:

I have a collection of mono wav files of research interviews, and I have been using Azure's speech-to-text to transcribe them. As such, I am using asynchronous file uploads. Azure's conversation transcription feature seems useful to me, mainly for its speaker diarization ability. The azure docs still indicate that 8-channel audio is needed for async conversation transcription, but is there a workaround for mono wav files?

The previous post listed that there was a private workaround, but unfortunately this will not work for me, as I will be working with my clients' individual Azure accounts so any functionality would need to be publicly accessible. I am working in JavaScript and if there any code demos then that would be greatly appreciated as well.

Also, the pricing website mentions that multichannel conversation transcription is a higher price than standard transcription ($2.10 vs $1.00 per hour), so if there is a single channel conversation transcription feature, which price categorization would this fall under?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,435 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,434 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 42,761 Reputation points Microsoft Employee

    @Marcus Ma The following is the documentation for batch transcription API.
    The samples to use the same using REST API are available in the SDK github repo.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    0 comments No comments