Customize voice prompts to users with Play action - An Azure Communication Services how-to document

Prerequisites

Azure account with an active subscription, for details see Create an account for free.
Azure Communication Services resource. See Create an Azure Communication Services resource. Save the connection string for this resource.
Create a new web service application using the Call Automation SDK.
The latest .NET library for your operating system.
Obtain the latest NuGet package.

For AI features

Create and connect Azure AI services to your Azure Communication Services resource.
Create a custom subdomain for your Azure AI services resource.

Create a new C# application

In the console window of your operating system, use the dotnet command to create a new web application.

dotnet new web -n MyApplication

Install the NuGet package

The NuGet package can be obtained from here, if you haven't already done so.

(Optional) Prepare your audio file if you wish to use audio files for playing prompts

Create an audio file, if you don't already have one, to use for playing prompts and messages to participants. The audio file must be hosted in a location that is accessible to Azure Communication Services with support for authentication. Keep a copy of the URL available for you to use when requesting to play the audio file. Azure Communication Services supports both file types of MP3 files with ID3V2TAG and WAV files, mono 16-bit PCM at 16 KHz sample rate. .

You can test creating your own audio file using our Speech synthesis with Audio Content Creation tool.

(Optional) Connect your Azure Cognitive Service to your Azure Communication Service

If you would like to use Text-To-Speech capabilities, then it's required for you to connect your Azure Cognitive Service to your Azure Communication Service.

Establish a call

By this point you should be familiar with starting calls, if you need to learn more about making a call, follow our quickstart. You can also use the code snippet provided here to understand how to answer a call.

var callAutomationClient = new CallAutomationClient("<Azure Communication Services connection string>");   

var answerCallOptions = new AnswerCallOptions("<Incoming call context once call is connected>", new Uri("<https://sample-callback-uri>"))  

{  
    CallIntelligenceOptions = new CallIntelligenceOptions() { CognitiveServicesEndpoint = new Uri("<Azure Cognitive Services Endpoint>") } 
};  

var answerCallResult = await callAutomationClient.AnswerCallAsync(answerCallOptions);

Play audio

Once the call has been established, there are multiple options for how you may wish to play the audio. You can play audio to the participant that has joined the call or play audio to all the participants in the call.

Play source - Audio file

To play audio to participants using audio files, you need to make sure the audio file is a WAV file, mono and 16 KHz. To play audio files, you need to make sure you provide Azure Communication Services with a uri to a file you host in a location where Azure Communication Services can access it. The FileSource type in our SDK can be used to specify audio files for the play action.

var playSource = new FileSource(new Uri(audioUri));

Play source - Text-To-Speech

To play audio using Text-To-Speech through Azure AI services, you need to provide the text you wish to play, as well either the SourceLocale, and VoiceKind or the VoiceName you wish to use. We support all voice names supported by Azure AI services, full list here.

String textToPlay = "Welcome to Contoso";

// Provide SourceLocale and VoiceKind to select an appropriate voice. 
var playSource = new TextSource(textToPlay, "en-US", VoiceKind.Female);

String textToPlay = "Welcome to Contoso"; 
 
// Provide VoiceName to select a specific voice. 
var playSource = new TextSource(textToPlay, "en-US-ElizabethNeural");

Play source - Text-To-Speech with SSML

If you want to customize your Text-To-Speech output even more with Azure AI services you can use Speech Synthesis Markup Language SSML when invoking your play action through Call Automation. With SSML you can fine-tune the pitch, pause, improve pronunciation, change speaking rate, adjust volume and attribute multiple voices.

String ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello World!</voice></speak>"; 

var playSource = new SsmlSource(ssmlToPlay);

Custom voice models

If you wish to enhance your prompts more and include custom voice models, the play action Text-To-Speech now supports these custom voices. These are a great option if you are trying to give customers a more local, personalized experience or have situations where the default models may not cover the words and accents you're trying to pronounce. To learn more about creating and deploying custom models you can read this guide.

Custom voice names regular text exmaple

String textToPlay = "Welcome to Contoso"; 
 
// Provide VoiceName and CustomVoiceEndpointId to select custom voice. 
var playSource = new TextSource(textToPlay)
    {
        VoiceName = "YourCustomVoiceName",
        CustomVoiceEndpointId = "YourCustomEndpointId"
    };

Custom voice names SSML example


var playSource = new SsmlSource(ssmlToPlay,"YourCustomEndpointId");

Once you've decided on which playSource you wish to use for playing audio, you can then choose whether you want to play it to a specific participant or to all participants.

Play audio to all participants

In this scenario, audio is played to all participants on the call.

var playResponse = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .PlayToAllAsync(playSource);

Play audio to a specific participant

In this scenario, audio is played to a specific participant.

var playTo = new List<CommunicationIdentifier> { targetParticipant }; 
var playResponse = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .PlayAsync(playSource, playTo);

Play audio on loop

You can use the loop option to play hold music that loops until your application is ready to accept the caller. Or progress the caller to the next logical step based on your applications business logic.

var playOptions = new PlayToAllOptions(playSource) 
{ 
    Loop = true 
}; 
var playResult = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .PlayToAllAsync(playOptions);

Enhance play with audio file caching

If you're playing the same audio file multiple times, your application can provide Azure Communication Services with the sourceID for the audio file. Azure Communication Services caches this audio file for 1 hour.

Note

Caching audio files isn't suitable for dynamic prompts. If you change the URL provided to Azure Communication Services, it does not update the cached URL straight away. The update will occur after the existing cache expires.

var playTo = new List<CommunicationIdentifier> { targetParticipant }; 
var playSource = new FileSource(new Uri(audioUri)) 
{ 
    PlaySourceCacheId = "<playSourceId>" 
}; 
var playResult = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .PlayAsync(playSource, playTo);

Handle play action event updates

Your application receives action lifecycle event updates on the callback URL that was provided to Call Automation service at the time of answering the call. An example of a successful play event update.

Example of how you can deserialize the PlayCompleted event:

if (acsEvent is PlayCompleted playCompleted) 
{ 
    logger.LogInformation("Play completed successfully, context={context}", playCompleted.OperationContext); 
}

Example of how you can deserialize the PlayFailed event:

if (acsEvent is PlayFailed playFailed) 
{ 
    if (MediaEventReasonCode.PlayDownloadFailed.Equals(playFailed.ReasonCode)) 
    { 
        logger.LogInformation("Play failed: download failed, context={context}", playFailed.OperationContext); 
    } 
    else if (MediaEventReasonCode.PlayInvalidFileFormat.Equals(playFailed.ReasonCode)) 
    { 
        logger.LogInformation("Play failed: invalid file format, context={context}", playFailed.OperationContext); 
    } 
    else 
    { 
        logger.LogInformation("Play failed, result={result}, context={context}", playFailed.ResultInformation?.Message, playFailed.OperationContext); 
    } 
}

To learn more about other supported events, visit the Call Automation overview document.

Cancel play action

Cancel all media operations, all pending media operations are canceled. This action also cancels other queued play actions.

var cancelResult = await callAutomationClient.GetCallConnection(callConnectionId) 
    .GetCallMedia() 
    .CancelAllMediaOperationsAsync();

Example of how you can deserialize the PlayCanceled event:

if (acsEvent is PlayCanceled playCanceled) 
{ 
    logger.LogInformation("Play canceled, context={context}", playCanceled.OperationContext); 
}

Prerequisites

Azure account with an active subscription, for details see Create an account for free.
Azure Communication Services resource. See Create an Azure Communication Services resource
Create a new web service application using the Call Automation SDK.
Java Development Kit version 8 or above.
Apache Maven.

For AI features

Create and connect Azure AI services to your Azure Communication Services resource.
Create a custom subdomain for your Azure AI services resource.

Create a new Java application

In your terminal or command window, navigate to the directory where you would like to create your Java application. Run the command shown here to generate the Java project from the maven-archetype-quickstart template.

mvn archetype:generate -DgroupId=com.communication.quickstart -DartifactId=communication-quickstart -DarchetypeArtifactId=maven-archetype-quickstart -DarchetypeVersion=1.4 -DinteractiveMode=false

The previous command creates a directory with the same name as artifactId argument. Under this directory, src/main/java directory contains the project source code, src/test/java directory contains the test source.

You notice that the 'generate' step created a directory with the same name as the artifactId. Under this directory, src/main/java directory contains source code, src/test/java directory contains tests, and pom.xml file is the project's Project Object Model, or POM.

Update your applications POM file to use Java 8 or higher.

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <maven.compiler.source>1.8</maven.compiler.source>
    <maven.compiler.target>1.8</maven.compiler.target>
</properties>

Add package references

In your POM file, add the following reference for the project.

azure-communication-callautomation

Azure Communication Services Call Automation SDK package is retrieved from the Azure SDK Dev Feed.

<dependency>
  <groupId>com.azure</groupId>
  <artifactId>azure-communication-callautomation</artifactId>
  <version>1.0.0</version>
</dependency>

(Optional) Prepare your audio file if you wish to use audio files for playing prompts

Create an audio file, if you don't already have one, to use for playing prompts and messages to participants. The audio file must be hosted in a location that is accessible to Azure Communication Services with support for authentication. Keep a copy of the URL available for you to use when requesting to play the audio file. Azure Communication Services supports both file types of MP3 files with ID3V2TAG and WAV files, mono 16-bit PCM at 16 KHz sample rate. .

You can test creating your own audio file using our Speech synthesis with Audio Content Creation tool.

(Optional) Connect your Azure Cognitive Service to your Azure Communication Service

If you would like to use Text-To-Speech capabilities, then it's required for you to connect your Azure Cognitive Service to your Azure Communication Service.

Update App.java with code

In your editor of choice, open App.java file and update it with the code provided in Update app.java with code section.

Establish a call

By this point you should be familiar with starting calls, if you need to learn more about making a call, follow our quickstart. You can also use the code snippet provided here to understand how to answer a call.

CallIntelligenceOptions callIntelligenceOptions = new CallIntelligenceOptions().setCognitiveServicesEndpoint("https://sample-cognitive-service-resource.cognitiveservices.azure.com/"); 
answerCallOptions = new AnswerCallOptions("<Incoming call context>", "<https://sample-callback-uri>").setCallIntelligenceOptions(callIntelligenceOptions); 
Response<AnswerCallResult> answerCallResult = callAutomationClient 
    .answerCallWithResponse(answerCallOptions) 
    .block();

Play audio

Once the call has been established, there are multiple options for how you may wish to play the audio. You can play audio to the participant that has joined the call or play audio to all the participants in the call.

Play source - Audio file

To play audio to participants using audio files, you need to make sure the audio file is a WAV file, mono and 16 KHz. To play audio files, you need to make sure you provide Azure Communication Services with a uri to a file you host in a location where Azure Communication Services can access it. The FileSource type in our SDK can be used to specify audio files for the play action.

var playSource = new FileSource(new Uri(audioUri));

Play source - Text-To-Speech

To play audio using Text-To-Speech through Azure AI services, you need to provide the text you wish to play, as well either the SourceLocale, and VoiceKind or the VoiceName you wish to use. We support all voice names supported by Azure AI services, full list here.

// Provide SourceLocale and VoiceKind to select an appropriate voice.
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setSourceLocale("en-US") 
    .setVoiceKind(VoiceKind.FEMALE);

// Provide VoiceName to select a specific voice.
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setVoiceName("en-US-ElizabethNeural");

Play source - Text-to-Speech SSML

String ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello World!</voice></speak>"; 
var playSource = new SsmlSource() 
    .setSsmlText(ssmlToPlay);

Custom voice models

If you wish to enhance your prompts more and include custom voice models, the play action Text-To-Speech now supports these custom voices. These are a great option if you are trying to give customers a more local, personalized experience or have situations where the default models may not cover the words and accents you're trying to pronounce. To learn more about creating and deploying custom models you can read this guide.

Custom voice names regular text exmaple

// Provide VoiceName and  to select a specific voice.
var playSource = new TextSource() 
    .setText(textToPlay) 
    .setCustomVoiceName("YourCustomVoiceName")
    .setCustomVoiceEndpointId("YourCustomEndpointId");

Custom voice names SSML example

String ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"YourCustomVoiceName\">Hello World!</voice></speak>"; 
var playSource = new SsmlSource() 
    .setSsmlText(ssmlToPlay)
    .setCustomVoiceEndpointId("YourCustomEndpointId");

Once you've decided on which playSource you wish to use for playing audio, you can then choose whether you want to play it to a specific participant or to all participants.

Play audio to all participants

In this scenario, audio is played to all participants on the call.

var playOptions = new PlayToAllOptions(playSource); 
var playResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .playToAllWithResponse(playOptions) 
    .block(); 
log.info("Play result: " + playResponse.getStatusCode());

Play audio to a specific participant

In this scenario, audio is played to a specific participant.

var playTo = Arrays.asList(targetParticipant); 
var playOptions = new PlayOptions(playSource, playTo); 
var playResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .playWithResponse(playOptions) 
    .block();

Play audio on loop

You can use the loop option to play hold music that loops until your application is ready to accept the caller. Or progress the caller to the next logical step based on your applications business logic.

var playOptions = new PlayToAllOptions(playSource) 
    .setLoop(true); 
var playResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .playToAllWithResponse(playOptions) 
    .block();

Enhance play with audio file caching

If you're playing the same audio file multiple times, your application can provide Azure Communication Services with the sourceID for the audio file. Azure Communication Services caches this audio file for 1 hour.

Note

Caching audio files isn't suitable for dynamic prompts. If you change the URL provided to Azure Communication Services, it does not update the cached URL straight away. The update will occur after the existing cache expires.

var playTo = Arrays.asList(targetParticipant); 
var playSource = new FileSource() 
    .setUrl(audioUri) \
    .setPlaySourceCacheId("<playSourceId>"); 
var playOptions = new PlayOptions(playSource, playTo); 
var playResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .playWithResponse(playOptions) 
    .block();

Handle play action event updates

Your application receives action lifecycle event updates on the callback URL that was provided to Call Automation service at the time of answering the call. An example of a successful play event update.

Example of how you can deserialize the PlayCompleted event:

if (acsEvent instanceof PlayCompleted) { 
    PlayCompleted event = (PlayCompleted) acsEvent; 
    log.info("Play completed, context=" + event.getOperationContext()); 
}

Example of how you can deserialize the PlayFailed event:

if (acsEvent instanceof PlayFailed) { 
    PlayFailed event = (PlayFailed) acsEvent; 
    if (ReasonCode.Play.DOWNLOAD_FAILED.equals(event.getReasonCode())) { 
        log.info("Play failed: download failed, context=" + event.getOperationContext()); 
    } else if (ReasonCode.Play.INVALID_FILE_FORMAT.equals(event.getReasonCode())) { 
        log.info("Play failed: invalid file format, context=" + event.getOperationContext()); 
    } else { 
        log.info("Play failed, result=" + event.getResultInformation().getMessage() + ", context=" + event.getOperationContext()); 
    } 
}

To learn more about other supported events, visit the Call Automation overview document.

Cancel play action

Cancel all media operations, all pending media operations are canceled. This action also cancels other queued play actions.

var cancelResponse = callAutomationClient.getCallConnectionAsync(callConnectionId) 
    .getCallMediaAsync() 
    .cancelAllMediaOperationsWithResponse() 
    .block(); 
log.info("Cancel result: " + cancelResponse.getStatusCode());

Example of how you can deserialize the PlayCanceled event:

if (acsEvent instanceof PlayCanceled) { 
    PlayCanceled event = (PlayCanceled) acsEvent; 
    log.info("Play canceled, context=" + event.getOperationContext()); 
}

Prerequisites

Azure account with an active subscription, for details see Create an account for free.
Azure Communication Services resource. See Create an Azure Communication Services resource. Save the connection string for this resource.
Create a new web service application using the Call Automation SDK.
Have Node.js installed, you can install it from their official website.

For AI features

Create and connect Azure AI services to your Azure Communication Services resource.
Create a custom subdomain for your Azure AI services resource.

Create a new JavaScript application

Create a new JavaScript application in your project directory. Initialize a new Node.js project with the following command. This creates a package.json file for your project, which is used to manage your project's dependencies.

npm init -y

Install the Azure Communication Services Call Automation package

npm install @azure/communication-call-automation

Create a new JavaScript file in your project directory, for example, name it app.js. You write your JavaScript code in this file. Run your application using Node.js with the following command. This code executes the JavaScript code you have written.

node app.js

(Optional) Prepare your audio file if you wish to use audio files for playing prompts

Create an audio file, if you don't already have one, to use for playing prompts and messages to participants. The audio file must be hosted in a location that is accessible to Azure Communication Services with support for authentication. Keep a copy of the URL available for you to use when requesting to play the audio file. Azure Communication Services supports both file types of MP3 files with ID3V2TAG and WAV files, mono 16-bit PCM at 16 KHz sample rate.

You can test creating your own audio file using our Speech synthesis with Audio Content Creation tool.

(Optional) Connect your Azure Cognitive Service to your Azure Communication Service

If you would like to use Text-To-Speech capabilities, then it's required for you to connect your Azure Cognitive Service to your Azure Communication Service.

Establish a call

By this point you should be familiar with starting calls, if you need to learn more about making a call, follow our quickstart. You can also use the code snippet provided here to understand how to answer a call.

const callIntelligenceOptions: CallIntelligenceOptions = { "<https://sample-callback-uri>" }; 
        const answerCallOptions: AnswerCallOptions = { callIntelligenceOptions: callIntelligenceOptions };
  
await acsClient.answerCall("<Incoming call context>", "<https://sample-callback-uri>", answerCallOptions);

Play audio

Once the call has been established, there are multiple options for how you may wish to play the audio. You can play audio to the participant that has joined the call or play audio to all the participants in the call.

Play source - Audio file

To play audio to participants using audio files, you need to make sure the audio file is a WAV file, mono and 16 KHz. To play audio files, you need to make sure you provide Azure Communication Services with a uri to a file you host in a location where Azure Communication Services can access it. The FileSource type in our SDK can be used to specify audio files for the play action.

const playSource: FileSource = { url: audioUri, kind: "fileSource" };

Play source - Text-To-Speech

To play audio using Text-To-Speech through Azure AI services, you need to provide the text you wish to play, as well either the SourceLocale, and VoiceKind or the VoiceName you wish to use. We support all voice names supported by Azure AI services, full list here.

const textToPlay = "Welcome to Contoso"; 
// Provide SourceLocale and VoiceKind to select an appropriate voice. 
const playSource: TextSource = { text: textToPlay, sourceLocale: "en-US", voiceKind: VoiceKind.Female, kind: "textSource" };

const textToPlay = "Welcome to Contoso"; 
// Provide VoiceName to select a specific voice. 
const playSource: TextSource = { text: textToPlay, voiceName: "en-US-ElizabethNeural", kind: "textSource" };

Play source - Text-To-Speech with SSML

If you want to customize your Text-To-Speech output even more with Azure AI services you can use Speech Synthesis Markup Language SSML when invoking your play action through Call Automation. With SSML you can fine-tune the pitch, pause, improve pronunciation, change speaking rate, adjust volume and attribute multiple voices.

const ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"en-US-JennyNeural\">Hello World!</voice></speak>"; 
const playSource: SsmlSource = { ssmlText: ssmlToPlay, kind: "ssmlSource" };

Custom voice models

If you wish to enhance your prompts more and include custom voice models, the play action Text-To-Speech now supports these custom voices. These are a great option if you are trying to give customers a more local, personalized experience or have situations where the default models may not cover the words and accents you're trying to pronounce. To learn more about creating and deploying custom models you can read this guide.

Custom voice names regular text exmaple

const textToPlay = "Welcome to Contoso";
// Provide VoiceName and CustomVoiceEndpointID to play your custom voice
const playSource: TextSource = { text: textToPlay, voiceName: "YourCustomVoiceName", customVoiceEndpointId: "YourCustomEndpointId"}

Custom voice names SSML example

const ssmlToPlay = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"en-US\"><voice name=\"YourCustomVoiceName\">Hello World!</voice></speak>"; 
const playSource: SsmlSource = { ssmlText: ssmlToPlay, kind: "ssmlSource", customVoiceEndpointId: "YourCustomEndpointId"};

Once you've decided on which playSource you wish to use for playing audio, you can then choose whether you want to play it to a specific participant or to all participants.

Play audio - All participants

In this scenario, audio is played to all participants on the call.

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .playToAll([ playSource ]);

Play audio - Specific participant

In this scenario, audio is played to a specific participant.

await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .play([ playSource ], [ targetParticipant ]);

Play audio on loop

You can use the loop option to play hold music that loops until your application is ready to accept the caller. Or progress the caller to the next logical step based on your applications business logic.

const playOptions: PlayOptions = { loop: true }; 
await callAutomationClient.getCallConnection(callConnectionId) 
    .getCallMedia() 
    .playToAll([ playSource ], playOptions);

Enhance play with audio file caching

If you're playing the same audio file multiple times, your application can provide Azure Communication Services with the sourceID for the audio file. Azure Communication Services caches this audio file for 1 hour.

Note

Caching audio files isn't suitable for dynamic prompts. If you change the URL provided to Azure Communication Services, it does not update the cached URL straight away. The update will occur after the existing cache expires.

const playSource: FileSource = { url: audioUri, playsourcacheid: "<playSourceId>", kind: "fileSource" }; 
await callAutomationClient.getCallConnection(callConnectionId) 
.getCallMedia() 
.play([ playSource ], [ targetParticipant ]);

Handle play action event updates

Your application receives action lifecycle event updates on the callback URL that was provided to Call Automation service at the time of answering the call.

Example of how you can deserialize the PlayCompleted event:

if (event.type === "Microsoft.Communication.PlayCompleted") { 
    console.log("Play completed, context=%s", eventData.operationContext); 
}

Example of how you can deserialize the PlayFailed event:

if (event.type === "Microsoft.Communication.PlayFailed") { 
    console.log("Play failed: data=%s", JSON.stringify(eventData)); 
}

To learn more about other supported events, visit the Call Automation overview document.

Cancel play action

Cancel all media operations, all pending media operations are canceled. This action also cancels other queued play actions.

await callAutomationClient.getCallConnection(callConnectionId) 
.getCallMedia() 
.cancelAllOperations();

Example of how you can deserialize the PlayCanceled event:

if (event.type === "Microsoft.Communication.PlayCanceled") {
    console.log("Play canceled, context=%s", eventData.operationContext);
}

Prerequisites

Azure account with an active subscription, for details see Create an account for free.
Azure Communication Services resource. See Create an Azure Communication Services resource. Save the connection string for this resource.
Create a new web service application using the Call Automation SDK.
Have Python installed, you can install from the official site.

For AI features

Create and connect Azure AI services to your Azure Communication Services resource.
Create a custom subdomain for your Azure AI services resource.

Create a new Python application

Set up a Python virtual environment for your project

python -m venv play-audio-app

Activate your virtual environment

On windows, use the following command:

.\ play-audio-quickstart \Scripts\activate

On Unix, use the following command:

source play-audio-quickstart /bin/activate

Install the Azure Communication Services Call Automation package

pip install azure-communication-callautomation

Create your application file in your project directory, for example, name it app.py. You write your Python code in this file.

Run your application using Python with the following command to execute code.

python app.py

(Optional) Prepare your audio file if you wish to use audio files for playing prompts

Create an audio file, if you don't already have one, to use for playing prompts and messages to participants. The audio file must be hosted in a location that is accessible to Azure Communication Services with support for authentication. Keep a copy of the URL available for you to use when requesting to play the audio file. Azure Communication Services supports both file types of MP3 and WAV files, mono 16-bit PCM at 16 KHz sample rate. .

You can test creating your own audio file using our Speech synthesis with Audio Content Creation tool.

(Optional) Connect your Azure Cognitive Service to your Azure Communication Service

If you would like to use Text-To-Speech capabilities, then it's required for you to connect your Azure Cognitive Service to your Azure Communication Service.

Establish a call

By this point you should be familiar with starting calls, if you need to learn more about making a call, follow our quickstart. You can also use the code snippet provided here to understand how to answer a call.

call_automation_client.answer_call(
    incoming_call_context="<Incoming call context>",
    callback_url="<https://sample-callback-uri>",
    cognitive_services_endpoint=COGNITIVE_SERVICE_ENDPOINT,
)

Play audio

Once the call has been established, there are multiple options for how you may wish to play the audio. You can play audio to the participant that has joined the call or play audio to all the participants in the call.

Play source - Audio file

To play audio to participants using audio files, you need to make sure the audio file is a WAV file, mono and 16 KHz. To play audio files, you need to make sure you provide Azure Communication Services with a uri to a file you host in a location where Azure Communication Services can access it. The FileSource type in our SDK can be used to specify audio files for the play action.

play_source = FileSource(url=audioUri)

Play source - Text-To-Speech

To play audio using Text-To-Speech through Azure AI services, you need to provide the text you wish to play, as well either the SourceLocale, and VoiceKind or the VoiceName you wish to use. We support all voice names supported by Azure AI services, full list here.

text_to_play = "Welcome to Contoso"

# Provide SourceLocale and VoiceKind to select an appropriate voice. 
play_source = TextSource(
    text=text_to_play, source_locale="en-US", voice_kind=VoiceKind.FEMALE
)
play_to = [target_participant]
call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

text_to_play = "Welcome to Contoso"

# Provide VoiceName to select a specific voice. 
play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural")
play_to = [target_participant]
call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Play source - Text-To-Speech with SSML

If you want to customize your Text-To-Speech output even more with Azure AI services you can use Speech Synthesis Markup Language SSML when invoking your play action through Call Automation. With SSML you can fine-tune the pitch, pause, improve pronunciation, change speaking rate, adjust volume and attribute multiple voices.

ssmlToPlay = '<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><voice name="en-US-JennyNeural">Hello World!</voice></speak>'

play_source = SsmlSource(ssml_text=ssmlToPlay)

play_to = [target_participant]

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Custom voice models

If you wish to enhance your prompts more and include custom voice models, the play action Text-To-Speech now supports these custom voices. These are a great option if you are trying to give customers a more local, personalized experience or have situations where the default models may not cover the words and accents you're trying to pronounce. To learn more about creating and deploying custom models you can read this guide.

Custom voice names regular text exmaple

text_to_play = "Welcome to Contoso"

# Provide VoiceName to select a specific voice. 
play_source = TextSource(text=text_to_play, voice_name="YourCustomVoiceName", custom_voice_endpoint_id = "YourCustomEndpointId")
play_to = [target_participant]
call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Custom voice names SSML example

ssmlToPlay = '<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><voice name="YourCustomVoiceName">Hello World!</voice></speak>'

play_source = SsmlSource(ssml_text=ssmlToPlay, custom_voice_endpoint_id="YourCustomEndpointId")

play_to = [target_participant]

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Once you've decided on which playSource you wish to use for playing audio, you can then choose whether you want to play it to a specific participant or to all participants.

Play audio - All participants

Play a prerecorded audio file to all participants in the call.

text_to_play = "Welcome to Contoso"

play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural")

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source
)

Play audio - Specific participant

Play a prerecorded audio file to a specific participant in the call.

play_to = [target_participant]

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Play audio on loop

You can use the loop option to play hold music that loops until your application is ready to accept the caller. Or progress the caller to the next logical step based on your applications business logic.

text_to_play = "Welcome to Contoso"

play_source = TextSource(text=text_to_play, voice_name="en-US-ElizabethNeural")

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, loop=True
)

Enhance play with audio file caching

If you're playing the same audio file multiple times, your application can provide Azure Communication Services with the sourceID for the audio file. Azure Communication Services caches this audio file for 1 hour.

Note

Caching audio files isn't suitable for dynamic prompts. If you change the URL provided to Azure Communication Services, it does not update the cached URL straight away. The update will occur after the existing cache expires.

play_source = FileSource(url=audioUri, play_source_cache_id="<playSourceId>")

play_to = [target_participant]

call_automation_client.get_call_connection(call_connection_id).play_media(
    play_source=play_source, play_to=play_to
)

Handle play action event updates

Your application receives action lifecycle event updates on the callback URL that was provided to Call Automation service at the time of answering the call.

Example of how you can deserialize the PlayCompleted event:

if event.type == "Microsoft.Communication.PlayCompleted":

    app.logger.info("Play completed, context=%s", event.data.get("operationContext"))

Example of how you can deserialize the PlayFailed event:

if event.type == "Microsoft.Communication.PlayFailed":

    app.logger.info("Play failed: data=%s", event.data)

To learn more about other supported events, visit the Call Automation overview document.

Cancel play action

Cancel all media operations, all pending media operations are canceled. This action also cancels other queued play actions.

call_automation_client.get_call_connection(
    call_connection_id
).cancel_all_media_operations()

Example of how you can deserialize the PlayCanceled event:

if event.type == "Microsoft.Communication.PlayCanceled":

    app.logger.info("Play canceled, context=%s", event.data.get("operationContext"))

Status	Code	Subcode	Message
PlayCompleted	200	0	Action completed successfully.
PlayCanceled	400	8508	Action failed, the operation was canceled.
PlayFailed	400	8535	Action failed, file format is invalid.
PlayFailed	400	8536	Action failed, file could not be downloaded.
PlayFailed	400	8565	Action failed, bad request to Azure AI services. Check input parameters.
PlayFailed	401	8565	Action failed, Azure AI services authentication error.
PlayFailed	403	8565	Action failed, forbidden request to Azure AI services, free subscription used by the request ran out of quota.
PlayFailed	429	8565	Action failed, requests exceeded the number of allowed concurrent requests for the Azure AI services subscription.
PlayFailed	408	8565	Action failed, request to Azure AI services timed out.
PlayFailed	500	9999	Unknown internal server error
PlayFailed	500	8572	Action failed due to play service shutdown.