Audio streaming overview - audio subscription

Important

Functionality described in this article is currently in public preview. This preview version is provided without a service-level agreement, and we don't recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Azure Communication Services provides developers with Audio Streaming capabilities to get real-time access to audio streams to capture, analyze, and process audio content during active calls. In today's world consumption of live audio and video is prevalent, this content could be in the forms of online meetings, online conferences, customer support, etc. With audio streaming access, developers can now build server applications to capture and analyze audio streams for each of the participants on the call in real-time. Developers can also combine audio streaming with other call automation actions or use their own AI models to analyze audio streams. Use cases include NLP for conversation analysis or providing real-time insights and suggestions to agents while they are in an active interaction with end users.

This public preview supports the ability for developers to get access to real-time audio streams over a WebSocket to analyze the call's audio in mixed and unmixed formats.

Common use cases

Audio streams can be used in many ways. Some examples of how developers may wish to use the audio streams in their applications include:

Real-time call assistance

Improved AI powered suggestions - Use real-time audio streams of active interactions between agents and customers to gauge the intent of the call and how your agents can provide a better experience to their customer through active suggestions using your own AI model to analyze the call.

Authentication

Biometric authentication – Use the audio streams to carry out voice authentication, by running the audio from the call through your voice recognition/matching engine/tool.

Sample architecture for subscribing to audio streams from an ongoing call - live agent scenario

Screenshot of architecture diagram for audio streaming.

Supported formats

Mixed format

Contains mixed audio of all participants on the call. All audio is flattened into one stream.

Unmixed

Contains audio per participant per channel, with support for up to four channels for the four most dominant speakers at any point in a call. You'll also get a participantRawID that you can use to determine the speaker.

Additional information

The table below describes information that will help developers convert the audio packets into audible content that can be used by their applications.

  • Framerate: 50 frames per second
  • Packet stream rate: 20 ms rate
  • Data packet: 64 Kbytes
  • Audio metric: 16-bit PCM mono at 16000 hz
  • Public string data is a base64 string that should be converted into a byte array to create raw PCM file.\

Billing

See the Azure Communication Services pricing page for information on how audio streaming is billed. Prices can be found in the calling category under audio streaming.

Next Steps

Check out the audio streaming quickstart to learn more.