Conversation Transcription support for mono audio streams

Brett Davis 36 Reputation points
2020-12-01T04:18:44.807+00:00

I know that the Conversation Transcription feature of the Speech Service is still in preview (https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/conversation-transcription), but I would like to know if there is any planned support (even if only partially so) for mono audio streams. Currently, it appears that this feature is only supported with 8 channel audio streams provided speech services SDK microphone arrays. I'd like to be able to get this feature working with voice streams provided by end user devices, such as phones, tablets, and laptops.

My end goal here is to provide both real-time speech transcription as well as speaker identification. I know that currently this can be achieved with a combination of Continuous Recognition during the conversation and Batch Transcription after the recording is completed, however the drawbacks here are that batch transcription isn't supported on free tier instances and doing a continuous recognition + batch transcription would end up resulting in a doubling of the costs of each transcription session.

The real time + async conversation transcription feature seems to have everything I'm looking for, but the lack of support for anything other than 8 channel mic array audio is really limiting.

So to summarize, I really have only 2 questions:

  • Does Conversation Transcription have any planned support for mono audio streams in the future?
  • Will that Batch Transcription API ever be available for free tier speech services?

Thanks!

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,391 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,366 questions
{count} votes

0 additional answers

Sort by: Most helpful