Speech to text on two streams

Question

Speech to text on two streams

assuage 1

I'm trying to run realtime STT on two streams (one through mic, one through speaker). These are the options I'm considering:

combine both streams into one and use the native diarization capability
use the multichannel capability
create two separate sessions

Option 1: I'm considering using PullAudioInputStream & combining both streams. But I'm using the Javascript SDK and I'm unable to figure out how to set diarization option. Additionally, it seems diarization is not that great just yet.

Option 2: this seems to be limited to the Conversation Transcription API but that requires 7 mics etc. Not viable for my use case.

Option 3: create two separate sessions - one per each stream. This would 2x the cost and I'd lose synchronization between the two streams.

Any thoughts?

1 answer

Your answer

Answer 1

GiftA-MSFT 11,176

Hi, thanks for reaching out. Please review response below:

Can you elaborate on what you're trying here?
Yes, for now there isn't a mono-conversation transcriber that's available. Although, we may be able to put you in touch with our product team to find out whether there's a private workaround.
If the streams start at the same time, the offsets could be used to reconcile the streams.

assuage 1 Reputation point

2021-06-22T18:21:03.123+00:00

Thank you for the response !

My overall goal is to transcribe both sides of a conversation (using mic & speaker).

Option 1 was to combine both these streams into a single stream. Then run diarization on the combined stream. Now that I think about it, it's not a great idea. There might be cross-talk which will skew the diarization. Plus from other posts, I understand diarization isn't that great at this time.

Option 2 - Would a "mono-conversation transcriber" support my use case? If yes, would love to find out the workaround.

Option 3 - would there be a performance implication? I'm using the Javascript SDK (essentially a node app wrapped in Electron).
GiftA-MSFT 11,176 Reputation points

2021-06-29T16:31:40.547+00:00

Hi, at this time, the best advice is to use two streams or wait for the Conversation Transcriber service to support something other than the 8 channel array. Sorry for any inconvenience. Hope this helps.
Arun Srinivasan 80 Reputation points

2024-02-21T04:07:25.5333333+00:00

Can someone throw light on how this can be acheived? We are trying something very similar.

Share via

Speech to text on two streams

1 answer

Your answer