Conversation Transcription support for mono audio streams

Question

I know that the Conversation Transcription feature of the Speech Service is still in preview (https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/conversation-transcription), but I would like to know if there is any planned support (even if only partially so) for mono audio streams. Currently, it appears that this feature is only supported with 8 channel audio streams provided speech services SDK microphone arrays. I'd like to be able to get this feature working with voice streams provided by end user devices, such as phones, tablets, and laptops.

My end goal here is to provide both real-time speech transcription as well as speaker identification. I know that currently this can be achieved with a combination of Continuous Recognition during the conversation and Batch Transcription after the recording is completed, however the drawbacks here are that batch transcription isn't supported on free tier instances and doing a continuous recognition + batch transcription would end up resulting in a doubling of the costs of each transcription session.

The real time + async conversation transcription feature seems to have everything I'm looking for, but the lack of support for anything other than 8 channel mic array audio is really limiting.

So to summarize, I really have only 2 questions:

Does Conversation Transcription have any planned support for mono audio streams in the future?
Will that Batch Transcription API ever be available for free tier speech services?

Thanks!

Accepted Answer

Please review my response in the private message via comments above. Once you provide contact details, I will connect you and the product team. Thanks.

Share via

Conversation Transcription support for mono audio streams

0 additional answers