Multi-device Conversation vs Real Time Speech-To-Text

sbenedek 1 Reputation point
2022-02-01T14:53:40.377+00:00

The "Real-Time Speech-To-Text" service really uses my Speech Service, and its model that I made with uploaded data. (accessing with my subscription key, region).
It is the question:
Does the "Multi-device Conversation" function use my model (made in Speech Studio) of Speech Service or Speech Services itself?
I ask it because I gave same subscription key, region and the endpoint of my model on both functions, and "Real-Time Speech-To-Text" gives BETTER results than "Multi-device conversation". (My language is Hungarian, and the S2T for deafs)
Real-Time Speech-To-Text: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text
Multi-device conversation: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/multi-device-conversation

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,392 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. YutongTie-MSFT 46,566 Reputation points
    2022-02-01T22:41:43.3+00:00

    @sbenedek

    Hello, thank you for reaching out to us here.

    I think you are mentioning two scenarios:

    1. Multi-device conversation: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/multi-device-conversation
    2. Real time conversation transcript: https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-conversation-transcription?pivots=programming-language-javascript

    Please correct me if I misunderstood since the first link you share is the document of general Speech to Text service.

    For the Multi-device conversation feature, everyone use it's conversation ID to join, this feature uses Speech Service default model.
    multi-device-conversation.png

    For the Real time conversation transcription, it creates voice signatures for the conversation participants so that they can be identified as unique speakers, but this is not necessary if you don't want to pre-enroll users.
    conversation-transcription-service.png

    Both of them use Speech SDK default models, but Real time conversation transcription has a improvement feature which may make the result better.

    If you want to train model by your own data set, I think you are mentioning Custom Speech. By using Custom Speech, you can train and deploy your own model with you data set.

    https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/custom-speech-overview

    Hope this helps! Please let us know if you have more questions.

    Please kindly accept the answer if you feel helpful, thank you.

    Regards,
    Yutong

    0 comments No comments

  2. sbenedek 1 Reputation point
    2022-02-03T11:42:37.91+00:00

    Hi!

    Thank you for the answering.

    Does the "Multi-device conversation" have any improvement features?

    Regards,
    Benedek.