Редагувати

Поділитися через


Customize voice prompts to users with Play action

This guide will help you get started with playing audio files to participants by using the play action provided through Azure Communication Services Call Automation SDK.

Prerequisites

For AI features

Create a new C# application

In the console window of your operating system, use the dotnet command to create a new web application.

dotnet new web -n MyApplication

Install the NuGet package

The NuGet package can be obtained from here, if you haven't already done so.

(Optional) Prepare your audio file if you wish to use audio files for playing prompts

Create an audio file, if you don't already have one, to use for playing prompts and messages to participants. The audio file must be hosted in a location that is accessible to Azure Communication Services with support for authentication. Keep a copy of the URL available for you to use when requesting to play the audio file. Azure Communication Services supports both file types of MP3 files with ID3V2TAG and WAV files, mono 16-bit PCM at 16 KHz sample rate. .

You can test creating your own audio file using our Speech synthesis with Audio Content Creation tool.

(Optional) Connect your Azure Cognitive Service to your Azure Communication Service

If you would like to use Text-To-Speech capabilities, then it's required for you to connect your Azure Cognitive Service to your Azure Communication Service.

Establish a call

By this point you should be familiar with starting calls, if you need to learn more about making a call, follow our quickstart. You can also use the code snippet provided here to understand how to answer a call.

var callAutomationClient = new CallAutomationClient("<Azure Communication Services connection string>");   

var answerCallOptions = new AnswerCallOptions("<Incoming call context once call is connected>", new Uri("<https://sample-callback-uri>"))  

{  
    CallIntelligenceOptions = new CallIntelligenceOptions() { CognitiveServicesEndpoint = new Uri("<Azure Cognitive Services Endpoint>") } 
};  

var answerCallResult = await callAutomationClient.AnswerCallAsync(answerCallOptions); 

Play audio

Once the call has been established, there are multiple options for how you may wish to play the audio. You can play audio to the participant that has joined the call or play audio to all the participants in the call.

Play source - Audio file

To play audio to participants using audio files, you need to make sure the audio file is a WAV file, mono and 16 KHz. To play audio files, you need to make sure you provide Azure Communication Services with a uri to a file you host in a location where Azure Communication Services can access it. The FileSource type in our SDK can be used to specify audio files for the play action.

var playSource = new FileSource(new Uri(audioUri));

//Multiple FileSource Prompts, if you want to play multiple audio files in one request you can provide them in a list.
//var playSources = new List<PlaySource>() { new FileSource(new Uri("https://www2.cs.uic.edu/~i101/SoundFiles/StarWars3.wav")), new FileSource(new Uri("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")) };

Play source - Text-To-Speech

To play audio using Text-To-Speech through Azure AI services, you need to provide the text you wish to play, as well either the SourceLocale, and VoiceKind or the VoiceName you wish to use. We support all voice names supported by Azure AI services, full list here.

String textToPlay = "Welcome to Contoso";

// Provide SourceLocale and VoiceKind to select an appropriate voice. 
var playSource = new TextSource(textToPlay, "en-US", VoiceKind.Female);

//Multiple TextSource prompt, if you want to play multiple text prompts in one request you can provide them in a list.
//var playSources = new List<PlaySource>() { new TextSource("recognize prompt one") { VoiceName = SpeechToTextVoice }, new TextSource("recognize prompt two") { VoiceName = SpeechToTextVoice }, new TextSource(content) { VoiceName = SpeechToTextVoice } };
String textToPlay = "Welcome to Contoso"; 
 
// Provide VoiceName to select a specific voice. 
var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural");

//Multiple TextSource prompt, if you want to play multiple text prompts in one request you can provide them in a list.
//var playSources = new List<PlaySource>() { new TextSource("recognize prompt one") { VoiceName = SpeechToTextVoice }, new TextSource("recognize prompt two") { VoiceName = SpeechToTextVoice }, new TextSource(content) { VoiceName = SpeechToTextVoice } };

Play source - Text-To-Speech with SSML

If you want to customize your Text-To-Speech output even more with Azure AI services you can use Speech Synthesis Markup Language SSML when invoking your play action through Call Automation. With SSML you can fine-tune the pitch, pause, improve pronunciation, change speaking rate, adjust volume and attribute multiple voices.

String ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello World!</voice></speak>"; 

var playSource = new SsmlSource(ssmlToPlay);

Custom voice models

If you wish to enhance your prompts more and include custom voice models, the play action Text-To-Speech now supports these custom voices. These are a great option if you are trying to give customers a more local, personalized experience or have situations where the default models may not cover the words and accents you're trying to pronounce. To learn more about creating and deploying custom models you can read this guide.

Custom voice names regular text exmaple

String textToPlay = "Welcome to Contoso"; 
 
// Provide VoiceName and CustomVoiceEndpointId to select custom voice. 
var playSource = new TextSource(textToPlay)
    {
        VoiceName = "YourCustomVoiceName",
        CustomVoiceEndpointId = "YourCustomEndpointId"
    };

Custom voice names SSML example


var playSource = new SsmlSource(ssmlToPlay,"YourCustomEndpointId");

Once you've decided on which playSource you wish to use for playing audio, you can then choose whether you want to play it to a specific participant or to all participants.

Play audio to all participants

In this scenario, audio is played to all participants on the call.

var playResponse = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .PlayToAllAsync(playSource); 

Support for barge-in

During scenarios where you're playing audio on loop to all participants e.g. waiting lobby you maybe playing audio to the participants in the lobby and keep them updated on their number in the queue. When you use the barge-in support, this will cancel the on-going audio and play your new message. Then if you wanted to continue playing your original audio you would make another play request.

var GoodbyePlaySource = new TextSource("Good bye")
{
    VoiceName = "en-US-NancyNeural"
};

PlayToAllOptions playOptions = new PlayToAllOptions(GoodbyePlaySource)
{
    InterruptCallMediaOperation = false,
    OperationCallbackUri = new Uri(callbackUriHost),
    Loop = true
};

await callConnectionMedia.PlayToAllAsync(playOptions);

// Interrupt media with text source

// Option1:
var interrupt = new TextSource("Interrupt prompt message")
{
    VoiceName = "en-US-NancyNeural"
};

PlayToAllOptions playInterrupt = new PlayToAllOptions(interrupt)
{
    InterruptCallMediaOperation = true,
    OperationCallbackUri = new Uri(callbackUriHost),
    Loop = false
};

await callConnectionMedia.PlayToAllAsync(playInterrupt);

/*
Option2: Interrupt media with file source
var interruptFile = new FileSource(new Uri(<AUDIO URL>));
PlayToAllOptions playFileInterrupt = new PlayToAllOptions(interruptFile)
{
    InterruptCallMediaOperation = true,
    OperationCallbackUri = new Uri(callbackUriHost),
    Loop = false
};
await callConnectionMedia.PlayToAllAsync(playFileInterrupt);
*/

Play audio to a specific participant

In this scenario, audio is played to a specific participant.

var playTo = new List<CommunicationIdentifier> { targetParticipant }; 
var playResponse = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .PlayAsync(playSource, playTo); 

Play multiple audio prompts

The play actions all supports the ability to send multiple play sources with just one request. This means you send a list of prompts to play in one go instead of individually making these requests.

Play audio on loop

You can use the loop option to play hold music that loops until your application is ready to accept the caller. Or progress the caller to the next logical step based on your applications business logic.

var playOptions = new PlayToAllOptions(playSource) 
{ 
    Loop = true 
}; 
var playResult = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .PlayToAllAsync(playOptions); 

Enhance play with audio file caching

If you're playing the same audio file multiple times, your application can provide Azure Communication Services with the sourceID for the audio file. Azure Communication Services caches this audio file for 1 hour.

Note

Caching audio files isn't suitable for dynamic prompts. If you change the URL provided to Azure Communication Services, it does not update the cached URL straight away. The update will occur after the existing cache expires.

var playTo = new List<CommunicationIdentifier> { targetParticipant }; 
var playSource = new FileSource(new Uri(audioUri)) 
{ 
    PlaySourceCacheId = "<playSourceId>" 
}; 
var playResult = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .PlayAsync(playSource, playTo); 

Handle play action event updates

Your application receives action lifecycle event updates on the callback URL that was provided to Call Automation service at the time of answering the call. An example of a successful play event update.

Example of how you can deserialize the PlayCompleted event:

if (acsEvent is PlayCompleted playCompleted) 
{ 
    logger.LogInformation("Play completed successfully, context={context}", playCompleted.OperationContext); 
} 

Example of how you can deserialize the PlayStarted event:

if (acsEvent is PlayStarted playStarted) 
{ 
    logger.LogInformation("Play started successfully, context={context}", playStarted.OperationContext); 
} 

Example of how you can deserialize the PlayFailed event:

if (acsEvent is PlayFailed playFailed) 
{ 
    if (MediaEventReasonCode.PlayDownloadFailed.Equals(playFailed.ReasonCode)) 
    { 
        logger.LogInformation("Play failed: download failed, context={context}", playFailed.OperationContext); 
    } 
    else if (MediaEventReasonCode.PlayInvalidFileFormat.Equals(playFailed.ReasonCode)) 
    { 
        logger.LogInformation("Play failed: invalid file format, context={context}", playFailed.OperationContext); 
    } 
    else 
    { 
        logger.LogInformation("Play failed, result={result}, context={context}", playFailed.ResultInformation?.Message, playFailed.OperationContext); 
    } 
} 

To learn more about other supported events, visit the Call Automation overview document.

Cancel play action

Cancel all media operations, all pending media operations are canceled. This action also cancels other queued play actions.

var cancelResult = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .CancelAllMediaOperationsAsync(); 

Example of how you can deserialize the PlayCanceled event:

if (acsEvent is PlayCanceled playCanceled) 
{ 
    logger.LogInformation("Play canceled, context={context}", playCanceled.OperationContext); 
} 

Prerequisites

For AI features

Create a new Java application

In your terminal or command window, navigate to the directory where you would like to create your Java application. Run the command shown here to generate the Java project from the maven-archetype-quickstart template.

mvn archetype:generate -DgroupId=com.communication.quickstart -DartifactId=communication-quickstart -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4 -DinteractiveMode=false

The previous command creates a directory with the same name as artifactId argument. Under this directory, src/main/java directory contains the project source code, src/test/java directory contains the test source.

You notice that the 'generate' step created a directory with the same name as the artifactId. Under this directory, src/main/java directory contains source code, src/test/java directory contains tests, and pom.xml file is the project's Project Object Model, or POM.

Update your applications POM file to use Java 8 or higher.

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
</properties>

Add package references

In your POM file, add the following reference for the project.

azure-communication-callautomation

Azure Communication Services Call Automation SDK package is retrieved from the Azure SDK Dev Feed.

<dependency>
  <groupId>com.azure</groupId>
  <artifactId>azure-communication-callautomation</artifactId>
  <version>1.0.0</version>
</dependency>

(Optional) Prepare your audio file if you wish to use audio files for playing prompts

Create an audio file, if you don't already have one, to use for playing prompts and messages to participants. The audio file must be hosted in a location that is accessible to Azure Communication Services with support for authentication. Keep a copy of the URL available for you to use when requesting to play the audio file. Azure Communication Services supports both file types of MP3 files with ID3V2TAG and WAV files, mono 16-bit PCM at 16 KHz sample rate. .

You can test creating your own audio file using our Speech synthesis with Audio Content Creation tool.

(Optional) Connect your Azure Cognitive Service to your Azure Communication Service

If you would like to use Text-To-Speech capabilities, then it's required for you to connect your Azure Cognitive Service to your Azure Communication Service.

Update App.java with code

In your editor of choice, open App.java file and update it with the code provided in Update app.java with code section.

Establish a call

By this point you should be familiar with starting calls, if you need to learn more about making a call, follow our quickstart. You can also use the code snippet provided here to understand how to answer a call.

CallIntelligenceOptions callIntelligenceOptions = new CallIntelligenceOptions().setCognitiveServicesEndpoint("https://sample-cognitive-service-resource.cognitiveservices.azure.com/"); 
answerCallOptions = new AnswerCallOptions("<Incoming call context>", "<https://sample-callback-uri>").setCallIntelligenceOptions(callIntelligenceOptions); 
Response<AnswerCallResult> answerCallResult = callAutomationClient 
    .answerCallWithResponse(answerCallOptions) 
    .block(); 

Play audio

Once the call has been established, there are multiple options for how you may wish to play the audio. You can play audio to the participant that has joined the call or play audio to all the participants in the call.

Play source - Audio file

To play audio to participants using audio files, you need to make sure the audio file is a WAV file, mono and 16 KHz. To play audio files, you need to make sure you provide Azure Communication Services with a uri to a file you host in a location where Azure Communication Services can access it. The FileSource type in our SDK can be used to specify audio files for the play action.

var playSource = new FileSource(new Uri(audioUri));

/* Multiple FileSource Prompts
var p1 = new FileSource().setUrl("https://www2.cs.uic.edu/~i101/SoundFiles/StarWars3.wav");
var p2 = new FileSource().setUrl("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav");

var playSources = new ArrayList();
playSources.add(p1);
playSources.add(p2);
*/

Play source - Text-To-Speech

To play audio using Text-To-Speech through Azure AI services, you need to provide the text you wish to play, as well either the SourceLocale, and VoiceKind or the VoiceName you wish to use. We support all voice names supported by Azure AI services, full list here.

// Provide SourceLocale and VoiceKind to select an appropriate voice.
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setSourceLocale("en-US") 
    .setVoiceKind(VoiceKind.FEMALE);

/* Multiple Prompt list setup: Multiple TextSource prompt

var p1 = new TextSource().setText("recognize prompt one").setSourceLocale("en-US").setVoiceKind(VoiceKind.FEMALE);
var p2 = new TextSource().setText("recognize prompt two").setSourceLocale("en-US").setVoiceKind(VoiceKind.FEMALE);
var p3 = new TextSource().setText(content).setSourceLocale("en-US").setVoiceKind(VoiceKind.FEMALE);

var playSources = new ArrayList();
playSources.add(p1);
playSources.add(p2);
playSources.add(p3);
*/
// Provide VoiceName to select a specific voice.
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setVoiceName("en-US-ElizabethNeural");

/* Multiple Prompt list setup: Multiple TextSource prompt

var p1 = new TextSource().setText("recognize prompt one").setVoiceName("en-US-NancyNeural");
var p2 = new TextSource().setText("recognize prompt two").setVoiceName("en-US-NancyNeural");
var p3 = new TextSource().setText(content).setVoiceName("en-US-NancyNeural");

var playSources = new ArrayList();
playSources.add(p1);
playSources.add(p2);
playSources.add(p3);
*/

Play source - Text-to-Speech SSML

String ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello World!</voice></speak>"; 
var playSource = new SsmlSource() 
    .setSsmlText(ssmlToPlay);

Custom voice models

If you wish to enhance your prompts more and include custom voice models, the play action Text-To-Speech now supports these custom voices. These are a great option if you are trying to give customers a more local, personalized experience or have situations where the default models may not cover the words and accents you're trying to pronounce. To learn more about creating and deploying custom models you can read this guide.

Custom voice names regular text exmaple

// Provide VoiceName and  to select a specific voice.
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setCustomVoiceName("YourCustomVoiceName")
    .setCustomVoiceEndpointId("YourCustomEndpointId");

Custom voice names SSML example

String ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"YourCustomVoiceName\">Hello World!</voice></speak>"; 
var playSource = new SsmlSource() 
    .setSsmlText(ssmlToPlay)
    .setCustomVoiceEndpointId("YourCustomEndpointId");

Once you've decided on which playSource you wish to use for playing audio, you can then choose whether you want to play it to a specific participant or to all participants.

Play audio to all participants

In this scenario, audio is played to all participants on the call.

var playOptions = new PlayToAllOptions(playSource); 
var playResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .playToAllWithResponse(playOptions) 
    .block(); 
log.info("Play result: " + playResponse.getStatusCode()); 

Support for barge-in

During scenarios where you're playing audio on loop to all participants e.g. waiting lobby you maybe playing audio to the participants in the lobby and keep them updated on their number in the queue. When you use the barge-in support, this will cancel the on-going audio and play your new message. Then if you wanted to continue playing your original audio you would make another play request.

// Option1: Interrupt media with text source
var textPlay = new TextSource()
    .setText("First Interrupt prompt message")
    .setVoiceName("en-US-NancyNeural");

var playToAllOptions = new PlayToAllOptions(textPlay)
    .setLoop(false)
    .setOperationCallbackUrl(appConfig.getBasecallbackuri())
    .setInterruptCallMediaOperation(false);

client.getCallConnection(callConnectionId)
    .getCallMedia()
    .playToAllWithResponse(playToAllOptions, Context.NONE);

/*
Option2: Interrupt media with text source
client.getCallConnection(callConnectionId)
    .getCallMedia()
    .playToAll(textPlay);
*/

/*
Option1: Barge-in with file source
var interruptFile = new FileSource()
    .setUrl("https://www2.cs.uic.edu/~i101/SoundFiles/StarWars3.wav");

var playFileOptions = new PlayToAllOptions(interruptFile)
    .setLoop(false)
    .setOperationCallbackUrl(appConfig.getBasecallbackuri())
    .setInterruptCallMediaOperation(true);

client.getCallConnection(callConnectionId)
    .getCallMedia()
    .playToAllWithResponse(playFileOptions, Context.NONE);

Option2: Barge-in with file source
client.getCallConnection(callConnectionId)
    .getCallMedia()
    .playToAll(interruptFile);
*/

Play audio to a specific participant

In this scenario, audio is played to a specific participant.

var playTo = Arrays.asList(targetParticipant); 
var playOptions = new PlayOptions(playSource, playTo); 
var playResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .playWithResponse(playOptions) 
    .block(); 

Play audio on loop

You can use the loop option to play hold music that loops until your application is ready to accept the caller. Or progress the caller to the next logical step based on your applications business logic.

var playOptions = new PlayToAllOptions(playSource) 
    .setLoop(true); 
var playResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .playToAllWithResponse(playOptions) 
    .block(); 

Enhance play with audio file caching

If you're playing the same audio file multiple times, your application can provide Azure Communication Services with the sourceID for the audio file. Azure Communication Services caches this audio file for 1 hour.

Note

Caching audio files isn't suitable for dynamic prompts. If you change the URL provided to Azure Communication Services, it does not update the cached URL straight away. The update will occur after the existing cache expires.

var playTo = Arrays.asList(targetParticipant); 
var playSource = new FileSource() 
    .setUrl(audioUri) \
    .setPlaySourceCacheId("<playSourceId>"); 
var playOptions = new PlayOptions(playSource, playTo); 
var playResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .playWithResponse(playOptions) 
    .block(); 

Handle play action event updates

Your application receives action lifecycle event updates on the callback URL that was provided to Call Automation service at the time of answering the call. An example of a successful play event update.

Example of how you can deserialize the PlayCompleted event:

if (acsEvent instanceof PlayCompleted) { 
    PlayCompleted event = (PlayCompleted) acsEvent; 
    log.info("Play completed, context=" + event.getOperationContext()); 
} 

Example of how you can deserialize the PlayStarted event:

if (acsEvent instanceof PlayStarted) { 
    PlayStarted event = (PlayStarted) acsEvent; 
    log.info("Play started, context=" + event.getOperationContext()); 
} 

Example of how you can deserialize the PlayFailed event:

if (acsEvent instanceof PlayFailed) { 
    PlayFailed event = (PlayFailed) acsEvent; 
    if (ReasonCode.Play.DOWNLOAD_FAILED.equals(event.getReasonCode())) { 
        log.info("Play failed: download failed, context=" + event.getOperationContext()); 
    } else if (ReasonCode.Play.INVALID_FILE_FORMAT.equals(event.getReasonCode())) { 
        log.info("Play failed: invalid file format, context=" + event.getOperationContext()); 
    } else { 
        log.info("Play failed, result=" + event.getResultInformation().getMessage() + ", context=" + event.getOperationContext()); 
    } 
} 

To learn more about other supported events, visit the Call Automation overview document.

Cancel play action

Cancel all media operations, all pending media operations are canceled. This action also cancels other queued play actions.

var cancelResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .cancelAllMediaOperationsWithResponse() 
    .block(); 
log.info("Cancel result: " + cancelResponse.getStatusCode()); 

Example of how you can deserialize the PlayCanceled event:

if (acsEvent instanceof PlayCanceled) { 
    PlayCanceled event = (PlayCanceled) acsEvent; 
    log.info("Play canceled, context=" + event.getOperationContext()); 
} 

Prerequisites

For AI features

Create a new JavaScript application

Create a new JavaScript application in your project directory. Initialize a new Node.js project with the following command. This creates a package.json file for your project, which is used to manage your project's dependencies.

npm init -y

Install the Azure Communication Services Call Automation package

npm install @azure/communication-call-automation

Create a new JavaScript file in your project directory, for example, name it app.js. You write your JavaScript code in this file. Run your application using Node.js with the following command. This code executes the JavaScript code you have written.

node app.js

(Optional) Prepare your audio file if you wish to use audio files for playing prompts

Create an audio file, if you don't already have one, to use for playing prompts and messages to participants. The audio file must be hosted in a location that is accessible to Azure Communication Services with support for authentication. Keep a copy of the URL available for you to use when requesting to play the audio file. Azure Communication Services supports both file types of MP3 files with ID3V2TAG and WAV files, mono 16-bit PCM at 16 KHz sample rate.

You can test creating your own audio file using our Speech synthesis with Audio Content Creation tool.

(Optional) Connect your Azure Cognitive Service to your Azure Communication Service

If you would like to use Text-To-Speech capabilities, then it's required for you to connect your Azure Cognitive Service to your Azure Communication Service.

Establish a call

By this point you should be familiar with starting calls, if you need to learn more about making a call, follow our quickstart. You can also use the code snippet provided here to understand how to answer a call.

const callIntelligenceOptions: CallIntelligenceOptions = { "<https://sample-callback-uri>" }; 
        const answerCallOptions: AnswerCallOptions = { callIntelligenceOptions: callIntelligenceOptions };
  
await acsClient.answerCall("<Incoming call context>", "<https://sample-callback-uri>", answerCallOptions); 

Play audio

Once the call has been established, there are multiple options for how you may wish to play the audio. You can play audio to the participant that has joined the call or play audio to all the participants in the call.

Play source - Audio file

To play audio to participants using audio files, you need to make sure the audio file is a WAV file, mono and 16 KHz. To play audio files, you need to make sure you provide Azure Communication Services with a uri to a file you host in a location where Azure Communication Services can access it. The FileSource type in our SDK can be used to specify audio files for the play action.

const playSource: FileSource = { url: audioUri, kind: "fileSource" };

Play source - Text-To-Speech

To play audio using Text-To-Speech through Azure AI services, you need to provide the text you wish to play, as well either the SourceLocale, and VoiceKind or the VoiceName you wish to use. We support all voice names supported by Azure AI services, full list here.

const textToPlay = "Welcome to Contoso"; 
// Provide SourceLocale and VoiceKind to select an appropriate voice. 
const playSource: TextSource = { text: textToPlay, sourceLocale: "en-US", voiceKind: VoiceKind.Female, kind: "textSource" }; 
const textToPlay = "Welcome to Contoso"; 
// Provide VoiceName to select a specific voice. 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" }; 

Play source - Text-To-Speech with SSML

If you want to customize your Text-To-Speech output even more with Azure AI services you can use Speech Synthesis Markup Language SSML when invoking your play action through Call Automation. With SSML you can fine-tune the pitch, pause, improve pronunciation, change speaking rate, adjust volume and attribute multiple voices.

const ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello World!</voice></speak>"; 
const playSource: SsmlSource = { ssmlText: ssmlToPlay, kind: "ssmlSource" }; 

Custom voice models

If you wish to enhance your prompts more and include custom voice models, the play action Text-To-Speech now supports these custom voices. These are a great option if you are trying to give customers a more local, personalized experience or have situations where the default models may not cover the words and accents you're trying to pronounce. To learn more about creating and deploying custom models you can read this guide.

Custom voice names regular text exmaple

const textToPlay = "Welcome to Contoso";
// Provide VoiceName and CustomVoiceEndpointID to play your custom voice
const playSource: TextSource = { text: textToPlay, voiceName: "YourCustomVoiceName", customVoiceEndpointId: "YourCustomEndpointId"}

Custom voice names SSML example

const ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"YourCustomVoiceName\">Hello World!</voice></speak>"; 
const playSource: SsmlSource = { ssmlText: ssmlToPlay, kind: "ssmlSource", customVoiceEndpointId: "YourCustomEndpointId"}; 

Once you've decided on which playSource you wish to use for playing audio, you can then choose whether you want to play it to a specific participant or to all participants.

Play audio - All participants

In this scenario, audio is played to all participants on the call.

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .playToAll([ playSource ]);

Support for barge-in

During scenarios where you're playing audio on loop to all participants e.g. waiting lobby you maybe playing audio to the participants in the lobby and keep them updated on their number in the queue. When you use the barge-in support, this will cancel the on-going audio and play your new message. Then if you wanted to continue playing your original audio you would make another play request.

// Interrupt media with text source 
//Option1:

const playSource: TextSource = { text: "Interrupt prompt", voiceName: "en-US-NancyNeural", kind: "textSource" };

const interruptOption: PlayToAllOptions = { 
loop: false, 
interruptCallMediaOperation: true, 
operationContext: "interruptOperationContext", 
operationCallbackUrl: process.env.CALLBACK_URI + "/api/callbacks" 
}; 

await callConnectionMedia.playToAll([playSource], interruptOption); 

/*
// Interrupt media with file source 

Option2: 

const playSource: FileSource = { 
url: MEDIA_URI + "MainMenu.wav", 
kind: "fileSource" 
}; 

const interruptOption: PlayToAllOptions = { 
loop: false, 
interruptCallMediaOperation: true, 
operationContext: "interruptOperationContext", 
operationCallbackUrl: process.env.CALLBACK_URI + "/api/callbacks" 
}; 

await callConnectionMedia.playToAll([playSource], interruptOption); 
*/

Play audio - Specific participant

In this scenario, audio is played to a specific participant.

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .play([ playSource ], [ targetParticipant ]); 

Play audio on loop

You can use the loop option to play hold music that loops until your application is ready to accept the caller. Or progress the caller to the next logical step based on your applications business logic.

const playOptions: PlayOptions = { loop: true }; 
await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .playToAll([ playSource ], playOptions); 

Enhance play with audio file caching

If you're playing the same audio file multiple times, your application can provide Azure Communication Services with the sourceID for the audio file. Azure Communication Services caches this audio file for 1 hour.

Note

Caching audio files isn't suitable for dynamic prompts. If you change the URL provided to Azure Communication Services, it does not update the cached URL straight away. The update will occur after the existing cache expires.

const playSource: FileSource = { url: audioUri, playsourcacheid: "<playSourceId>", kind: "fileSource" }; 
await callAutomationClient.getCallConnection(callConnectionId) 
.getCallMedia() 
.play([ playSource ], [ targetParticipant ]);

Handle play action event updates

Your application receives action lifecycle event updates on the callback URL that was provided to Call Automation service at the time of answering the call.

Example of how you can deserialize the PlayCompleted event:

if (event.type === "Microsoft.Communication.PlayCompleted") { 
    console.log("Play completed, context=%s", eventData.operationContext); 
} 

Example of how you can deserialize the PlayFailed event:

if (event.type === "Microsoft.Communication.PlayFailed") { 
    console.log("Play failed: data=%s", JSON.stringify(eventData)); 
} 

Example of how you can deserialize the PlayStarted event:

if (event.type === "Microsoft.Communication.PlayStarted") { 
    console.log("Play started: data=%s", JSON.stringify(eventData)); 
} 

To learn more about other supported events, visit the Call Automation overview document.

Cancel play action

Cancel all media operations, all pending media operations are canceled. This action also cancels other queued play actions.

await callAutomationClient.getCallConnection(callConnectionId) 
.getCallMedia() 
.cancelAllOperations();

Example of how you can deserialize the PlayCanceled event:

if (event.type === "Microsoft.Communication.PlayCanceled") {
    console.log("Play canceled, context=%s", eventData.operationContext);
}

Prerequisites

For AI features

Create a new Python application

Set up a Python virtual environment for your project

python -m venv play-audio-app

Activate your virtual environment

On windows, use the following command:

.\ play-audio-quickstart \Scripts\activate

On Unix, use the following command:

source play-audio-quickstart /bin/activate

Install the Azure Communication Services Call Automation package

pip install azure-communication-callautomation

Create your application file in your project directory, for example, name it app.py. You write your Python code in this file.

Run your application using Python with the following command to execute code.

python app.py

(Optional) Prepare your audio file if you wish to use audio files for playing prompts

Create an audio file, if you don't already have one, to use for playing prompts and messages to participants. The audio file must be hosted in a location that is accessible to Azure Communication Services with support for authentication. Keep a copy of the URL available for you to use when requesting to play the audio file. Azure Communication Services supports both file types of MP3 and WAV files, mono 16-bit PCM at 16 KHz sample rate. .

You can test creating your own audio file using our Speech synthesis with Audio Content Creation tool.

(Optional) Connect your Azure Cognitive Service to your Azure Communication Service

If you would like to use Text-To-Speech capabilities, then it's required for you to connect your Azure Cognitive Service to your Azure Communication Service.

Establish a call

By this point you should be familiar with starting calls, if you need to learn more about making a call, follow our quickstart. You can also use the code snippet provided here to understand how to answer a call.

call_automation_client.answer_call(
    incoming_call_context="<Incoming call context>",
    callback_url="<https://sample-callback-uri>",
    cognitive_services_endpoint=COGNITIVE_SERVICE_ENDPOINT,
)

Play audio

Once the call has been established, there are multiple options for how you may wish to play the audio. You can play audio to the participant that has joined the call or play audio to all the participants in the call.

Play source - Audio file

To play audio to participants using audio files, you need to make sure the audio file is a WAV file, mono and 16 KHz. To play audio files, you need to make sure you provide Azure Communication Services with a uri to a file you host in a location where Azure Communication Services can access it. The FileSource type in our SDK can be used to specify audio files for the play action.

play_source = FileSource(url=audioUri)

#Play multiple audio files
#file_source1 = FileSource(MAIN_MENU_PROMPT_URI) 
#file_source2 = FileSource(MAIN_MENU_PROMPT_URI) 
#
# play_sources = [file_source1, file_source2]
# 
# call_connection_client.play_media_to_all(
#     play_source=play_sources,
#     interrupt_call_media_operation=False,
#     operation_context="multiplePlayContext",
#     operation_callback_url=CALLBACK_EVENTS_URI,
#     loop=False
# )

Play source - Text-To-Speech

To play audio using Text-To-Speech through Azure AI services, you need to provide the text you wish to play, as well either the SourceLocale, and VoiceKind or the VoiceName you wish to use. We support all voice names supported by Azure AI services, full list here.

text_to_play = "Welcome to Contoso"

# Provide SourceLocale and VoiceKind to select an appropriate voice. 
play_source = TextSource(
    text=text_to_play, source_locale="en-US", voice_kind=VoiceKind.FEMALE
)
play_to = [target_participant]
call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

#Multiple text prompts
#play_source1 = TextSource(text="Hi, This is multiple play source one call media test.", source_locale="en-US", voice_kind=VoiceKind.FEMALE) 
#play_source2 = TextSource(text="Hi, This is multiple play source two call media test.", source_locale="en-US", voice_kind=VoiceKind.FEMALE)
#
#play_sources = [play_source1, play_source2]
#
#call_connection_client.play_media_to_all(
#    play_source=play_sources,
#    interrupt_call_media_operation=False,
#    operation_context="multiplePlayContext",
#    operation_callback_url=CALLBACK_EVENTS_URI,
#    loop=False
#)
text_to_play = "Welcome to Contoso"

# Provide VoiceName to select a specific voice. 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural")
play_to = [target_participant]
call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

#Play multiple text prompts
#play_source1 = TextSource(text="Hi, This is multiple play source one call media test.", voice_name=SPEECH_TO_TEXT_VOICE) 
#play_source2 = TextSource(text="Hi, This is multiple play source two call media test.", voice_name=SPEECH_TO_TEXT_VOICE)
#
#play_sources = [play_source1, play_source2]
#
#call_connection_client.play_media_to_all(
#    play_source=play_sources,
#    interrupt_call_media_operation=False,
#    operation_context="multiplePlayContext",
#    operation_callback_url=CALLBACK_EVENTS_URI,
#    loop=False
#)

Play source - Text-To-Speech with SSML

If you want to customize your Text-To-Speech output even more with Azure AI services you can use Speech Synthesis Markup Language SSML when invoking your play action through Call Automation. With SSML you can fine-tune the pitch, pause, improve pronunciation, change speaking rate, adjust volume and attribute multiple voices.

ssmlToPlay = '<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><voice name="en-US-JennyNeural">Hello World!</voice></speak>'

play_source = SsmlSource(ssml_text=ssmlToPlay)

play_to = [target_participant]

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Custom voice models

If you wish to enhance your prompts more and include custom voice models, the play action Text-To-Speech now supports these custom voices. These are a great option if you are trying to give customers a more local, personalized experience or have situations where the default models may not cover the words and accents you're trying to pronounce. To learn more about creating and deploying custom models you can read this guide.

Custom voice names regular text exmaple

text_to_play = "Welcome to Contoso"

# Provide VoiceName to select a specific voice. 
play_source = TextSource(text=text_to_play, voice_name="YourCustomVoiceName", custom_voice_endpoint_id = "YourCustomEndpointId")
play_to = [target_participant]
call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Custom voice names SSML example

ssmlToPlay = '<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><voice name="YourCustomVoiceName">Hello World!</voice></speak>'

play_source = SsmlSource(ssml_text=ssmlToPlay, custom_voice_endpoint_id="YourCustomEndpointId")

play_to = [target_participant]

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Once you've decided on which playSource you wish to use for playing audio, you can then choose whether you want to play it to a specific participant or to all participants.

Play audio - All participants

Play a prerecorded audio file to all participants in the call.

text_to_play = "Welcome to Contoso"

play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural")

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source
)

Support for barge-in

During scenarios where you're playing audio on loop to all participants e.g. waiting lobby you maybe playing audio to the participants in the lobby and keep them updated on their number in the queue. When you use the barge-in support, this will cancel the on-going audio and play your new message. Then if you wanted to continue playing your original audio you would make another play request.

# Interrupt media with text source
# Option 1
play_source = TextSource(text="This is interrupt call media test.", voice_name=SPEECH_TO_TEXT_VOICE)
call_connection_client.play_media_to_all(
    play_source, 
    interrupt_call_media_operation=True, 
    operation_context="interruptContext", 
    operation_callback_url=CALLBACK_EVENTS_URI, 
    loop=False
)

# Interrupt media with file source
# Option 2
#play_source = FileSource(MAIN_MENU_PROMPT_URI)
#call_connection_client.play_media_to_all(
#    play_source, 
#    interrupt_call_media_operation=True, 
#    operation_context="interruptContext", 
#    operation_callback_url=CALLBACK_EVENTS_URI, 
#    loop=False
#)

Play audio - Specific participant

Play a prerecorded audio file to a specific participant in the call.

play_to = [target_participant]

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Play audio on loop

You can use the loop option to play hold music that loops until your application is ready to accept the caller. Or progress the caller to the next logical step based on your applications business logic.

text_to_play = "Welcome to Contoso"

play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural")

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, loop=True
)

Enhance play with audio file caching

If you're playing the same audio file multiple times, your application can provide Azure Communication Services with the sourceID for the audio file. Azure Communication Services caches this audio file for 1 hour.

Note

Caching audio files isn't suitable for dynamic prompts. If you change the URL provided to Azure Communication Services, it does not update the cached URL straight away. The update will occur after the existing cache expires.

play_source = FileSource(url=audioUri, play_source_cache_id="<playSourceId>")

play_to = [target_participant]

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Handle play action event updates

Your application receives action lifecycle event updates on the callback URL that was provided to Call Automation service at the time of answering the call.

Example of how you can deserialize the PlayCompleted event:

if event.type == "Microsoft.Communication.PlayCompleted":

    app.logger.info("Play completed, context=%s", event.data.get("operationContext"))

Example of how you can deserialize the PlayStarted event:

if event.type == "Microsoft.Communication.PlayStarted":

    app.logger.info("Play started, context=%s", event.data.get("operationContext"))

Example of how you can deserialize the PlayFailed event:

if event.type == "Microsoft.Communication.PlayFailed":

    app.logger.info("Play failed: data=%s", event.data)

To learn more about other supported events, visit the Call Automation overview document.

Cancel play action

Cancel all media operations, all pending media operations are canceled. This action also cancels other queued play actions.

call_automation_client.get_call_connection(
    call_connection_id
).cancel_all_media_operations()

Example of how you can deserialize the PlayCanceled event:

if event.type == "Microsoft.Communication.PlayCanceled":

    app.logger.info("Play canceled, context=%s", event.data.get("operationContext"))

Event codes

Status Code Subcode Message
PlayCompleted 200 0 Action completed successfully.
PlayCanceled 400 8508 Action failed, the operation was canceled.
PlayFailed 400 8535 Action failed, file format is invalid.
PlayFailed 400 8536 Action failed, file could not be downloaded.
PlayFailed 400 8565 Action failed, bad request to Azure AI services. Check input parameters.
PlayFailed 401 8565 Action failed, Azure AI services authentication error.
PlayFailed 403 8565 Action failed, forbidden request to Azure AI services, free subscription used by the request ran out of quota.
PlayFailed 429 8565 Action failed, requests exceeded the number of allowed concurrent requests for the Azure AI services subscription.
PlayFailed 408 8565 Action failed, request to Azure AI services timed out.
PlayFailed 500 9999 Unknown internal server error
PlayFailed 500 8572 Action failed due to play service shutdown.

Known limitations

  • Text-to-Speech text prompts support a maximum of 400 characters, if your prompt is longer than this we suggest using SSML for Text-to-Speech based play actions.
  • For scenarios where you exceed your Speech service quota limit, you can request to increase this lilmit by following the steps outlined here.

Clean up resources

If you want to clean up and remove a Communication Services subscription, you can delete the resource or resource group. Deleting the resource group also deletes any other resources associated with it. Learn more about cleaning up resources.

Next steps