Speech to text on two streams

assuage 1 Reputation point
2021-06-21T18:14:20.12+00:00

I'm trying to run realtime STT on two streams (one through mic, one through speaker). These are the options I'm considering:

  1. combine both streams into one and use the native diarization capability
  2. use the multichannel capability
  3. create two separate sessions

Option 1: I'm considering using PullAudioInputStream & combining both streams. But I'm using the Javascript SDK and I'm unable to figure out how to set diarization option. Additionally, it seems diarization is not that great just yet.

Option 2: this seems to be limited to the Conversation Transcription API but that requires 7 mics etc. Not viable for my use case.

Option 3: create two separate sessions - one per each stream. This would 2x the cost and I'd lose synchronization between the two streams.

Any thoughts?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
2,061 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,602 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. GiftA-MSFT 11,176 Reputation points
    2021-06-22T17:23:25.72+00:00

    Hi, thanks for reaching out. Please review response below:

    1. Can you elaborate on what you're trying here?
    2. Yes, for now there isn't a mono-conversation transcriber that's available. Although, we may be able to put you in touch with our product team to find out whether there's a private workaround.
    3. If the streams start at the same time, the offsets could be used to reconcile the streams.
    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.