Playing audio in call

The play action provided through the Azure Communication Services Call Automation SDK allows you to play audio prompts to participants in the call. This action can be accessed through the server-side implementation of your application. You can play audio to call participants through one of two methods;

  • Providing Azure Communication Services access to prerecorded audio files of WAV format, that Azure Communication Services can access with support for authentication
  • Regular text that can be converted into speech output through the integration with Azure AI services.

You can use the newly announced integration between Azure Communication Services and Azure AI services to play personalized responses using Azure Text-To-Speech. You can use human like prebuilt neural voices out of the box or create custom neural voices that are unique to your product or brand. For more information on supported voices, languages and locales see Language and voice support for the Speech service.

Note

Azure Communication Services currently supports two file formats, MP3 files with ID3V2TAG and WAV files formatted as 16-bit PCM mono channel audio recorded at 16KHz. You can create your own audio files using Speech synthesis with Audio Content Creation tool.

Prebuilt Neural Text to Speech voices

Microsoft uses deep neural networks to overcome the limits of traditional speech synthesis with regard to stress and intonation in spoken language. Prosody prediction and voice synthesis occur simultaneously, resulting in a more fluid and natural sounding output. You can use these neural voices to make interactions with your chatbots and voice assistants more natural and engaging. There are over 100 prebuilt voices to choose from. Learn more about Azure Text-to-Speech voices.

Common use cases

The play action can be used in many ways, some examples of how developers may wish to use the play action in their applications are listed here.

Announcements

Your application might want to play some sort of announcement when a participant joins or leaves the call, to notify other users.

Self-serve customers

In scenarios with IVRs and virtual assistants, you can use your application or bots to play audio prompts to callers, this prompt can be in the form of a menu to guide the caller through their interaction.

Hold music

The play action can also be used to play hold music for callers. This action can be set up in a loop so that the music keeps playing until an agent is available to assist the caller.

Playing compliance messages

As part of compliance requirements in various industries, vendors are expected to play legal or compliance messages to callers, for example, “This call is recorded for quality purposes.”.

Sample architecture for playing audio in call using Text-To-Speech

Diagram showing sample architecture for Play with AI.

Sample architecture for playing audio in a call

Screenshot of flow for play action.

Known limitations

  • Text-to-Speech text prompts support a maximum of 400 characters, if your prompt is longer than this we suggest using SSML for Text-to-Speech based play actions.
  • For scenarios where you exceed your Speech service quota limit, you can request to increase this limit by following the steps outlined here.

Next Steps