Multi-device Conversation vs Real Time Speech-To-Text

Question

Multi-device Conversation vs Real Time Speech-To-Text

sbenedek 1

The "Real-Time Speech-To-Text" service really uses my Speech Service, and its model that I made with uploaded data. (accessing with my subscription key, region).
It is the question:
Does the "Multi-device Conversation" function use my model (made in Speech Studio) of Speech Service or Speech Services itself?
I ask it because I gave same subscription key, region and the endpoint of my model on both functions, and "Real-Time Speech-To-Text" gives BETTER results than "Multi-device conversation". (My language is Hungarian, and the S2T for deafs)
Real-Time Speech-To-Text: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text
Multi-device conversation: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/multi-device-conversation

2 answers

Your answer

Answer 1

@sbenedek

Hello, thank you for reaching out to us here.

I think you are mentioning two scenarios:

Multi-device conversation: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/multi-device-conversation
Real time conversation transcript: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-conversation-transcription?pivots=programming-language-javascript

Please correct me if I misunderstood since the first link you share is the document of general Speech to Text service.

For the Multi-device conversation feature, everyone use it's conversation ID to join, this feature uses Speech Service default model.

For the Real time conversation transcription, it creates voice signatures for the conversation participants so that they can be identified as unique speakers, but this is not necessary if you don't want to pre-enroll users.

Both of them use Speech SDK default models, but Real time conversation transcription has a improvement feature which may make the result better.

If you want to train model by your own data set, I think you are mentioning Custom Speech. By using Custom Speech, you can train and deploy your own model with you data set.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/custom-speech-overview

Hope this helps! Please let us know if you have more questions.

Please kindly accept the answer if you feel helpful, thank you.

Regards,
Yutong

Answer 2

sbenedek 1

Hi!

Thank you for the answering.

Does the "Multi-device conversation" have any improvement features?

Regards,
Benedek.

YutongTie-MSFT 53,971 Reputation points Moderator

2022-02-03T17:40:08.73+00:00

@sbenedek

For now, there is no improvement features for Multi-device conversation, please let us know your requirement so that we can look into it.

Thank you!

-Please kindly accept the answer if you feel helpful, thank you!

Regards,
Yutong
sbenedek 1 Reputation point

2022-02-06T17:46:02.077+00:00

My first requirement is the Speech can recognize the English words (for example IT words, Jira, Confluence, Azure, etc.) in Hungarian context, main language (for example Hungarian language).
Second is that I can use "Speech Studio" on"multi-device" scenario, too as host, too as participant (i mean, they can access to the actual model of Speech Studio instead of default, for better results (it would be easier for deafs to understand the situations).

Thank you.

Share via

Multi-device Conversation vs Real Time Speech-To-Text

2 answers

Your answer