Azure Communication Service - Access to audio stream and pass it into Speech-To-Text service in real time

Nikita Kuzmin 66 Reputation points

I've started to investigate Azure Communication Service SDK for .Net. I'm trying to figure out if it is possible how to create a video call between 2 people and get remote participant's audio stream in order to pass it into azure speech service, generate some text in real time and then make some analysis.
So, the main question is - how to get remote participant's audio stream during the video call? It is possible? I don't see any entities, fields or properties connected with audio in Call, CallAgent, LocalVideoStream and other video call entities. If you have examples of something similiar, it will very helpful for me. Thank you!

Azure Communication Services
Azure Communication Services
An Azure communication platform for deploying applications across devices and platforms.
496 questions
{count} votes

Accepted answer
  1. brtrach-MSFT 10,596 Reputation points Microsoft Employee

    @Nikita Kuzmin Thank you for your question around being able to get the raw audio from a video call.

    We verified with the product group that at this time, the necessary capability is not available. They did verify though that a feature to allow access to the raw audio and video streams is being looked into but no ETA is available at this time.

    Another feature that is being worked on is closed captioning. This is also being worked on and no ETA is available.

    ACS was just released less than a year ago and the team is hard at work at adding features and the roadmap is very bright. Features can be announced any time but a lot of products tie their announcements to //Build or Ignite conferences so keep an eye out for news especially around then. Keep an eye here for any updates.

    P.S. Thank you for the verified answer and 5 star survey on the other thread. This feedback was recognized by my manager and helps us as engineers. We appreciate your feedback. The other thread you asked for assistance with is owned by my co-worker and they are in discussion with the product group right now so you should hopefully receive an update on that thread shortly.

    Let us know if you have any further questions or concerns regarding this topic. Otherwise I hope you have a great weekend.

    2 people found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. jeya pandian 21 Reputation points


    i like add up another request, this is for translation too for real time, instead of text.

    if you provide audio stream means need the ability to mute Particular person.

    Say, we run a global meeting Key speaker deliver in English, the translators will do the real time translation in multiple language. so, if a person want to hear in his native language he will mute the key speaker audio and listen to real time translator audio.

    Is it possible ?

    0 comments No comments