Quickstart: Post-call transcription and analytics
Language service documentation | Language Studio | Speech service documentation | Speech Studio
In this C# quickstart, you perform sentiment analysis and conversation summarization of call center transcriptions. The sample will automatically identify, categorize, and redact sensitive information. The quickstart implements a cross-service scenario that uses features of the Azure Cognitive Speech and Azure Cognitive Language services.
Tip
Try the Language Studio or Speech Studio for a demonstration on how to use the Language and Speech services to analyze call center conversations.
To deploy a call center transcription solution to Azure with a no-code approach, try the Ingestion Client.
The following Azure AI services for Speech features are used in the quickstart:
- Batch transcription: Submit a batch of audio files for transcription.
- Speaker separation: Separate multiple speakers through diarization of mono 16khz 16 bit PCM wav files.
The Language service offers the following features that are used:
- Personally Identifiable Information (PII) extraction and redaction: Identify, categorize, and redact sensitive information in conversation transcription.
- Conversation summarization: Summarize in abstract text what each conversation participant said about the issues and resolutions. For example, a call center can group product issues that have a high volume.
- Sentiment analysis and opinion mining: Analyze transcriptions and associate positive, neutral, or negative sentiment at the utterance and conversation-level.
Prerequisites
- Azure subscription - Create one for free
- Create a multi-service resource in the Azure portal. This quickstart only requires one Azure AI services multi-service resource. The sample code allows you to specify separate Language and Speech resource keys.
- Get the resource key and region. After your Azure AI services resource is deployed, select Go to resource to view and manage keys.
Important
This quickstart requires access to conversation summarization. To get access, you must submit an online request and have it approved.
The --languageKey
and --languageEndpoint
values in this quickstart must correspond to a resource that's in one of the regions supported by the conversation summarization API: eastus
, northeurope
, and uksouth
.
Run post-call transcription analysis with C#
Follow these steps to build and run the post-call transcription analysis quickstart code example.
Copy the scenarios/csharp/dotnetcore/call-center/ sample files from GitHub. If you have Git installed, open a command prompt and run the
git clone
command to download the Speech SDK samples repository.git clone https://github.com/Azure-Samples/cognitive-services-speech-sdk.git
Open a command prompt and change to the project directory.
cd <your-local-path>/scenarios/csharp/dotnetcore/call-center/call-center/
Build the project with the .NET CLI.
dotnet build
Run the application with your preferred command line arguments. See usage and arguments for the available options.
Here's an example that transcribes from an example audio file at GitHub:
dotnet run --languageKey YourResourceKey --languageEndpoint YourResourceEndpoint --speechKey YourResourceKey --speechRegion YourResourceRegion --input "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/scenarios/call-center/sampledata/Call1_separated_16k_health_insurance.wav" --stereo --output summary.json
If you already have a transcription for input, here's an example that only requires a Language resource:
dotnet run --languageKey YourResourceKey --languageEndpoint YourResourceEndpoint --jsonInput "YourTranscriptionFile.json" --stereo --output summary.json
Replace
YourResourceKey
with your Azure AI services resource key, replaceYourResourceRegion
with your Azure AI services resource region (such aseastus
), and replaceYourResourceEndpoint
with your Azure AI services endpoint. Make sure that the paths specified by--input
and--output
are valid. Otherwise you must change the paths.Important
Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like Azure Key Vault. See the Azure AI services security article for more information.
Check results
The console output shows the full conversation and summary. Here's an example of the overall summary, with redactions for brevity:
Conversation summary:
issue: Customer wants to sign up for insurance.
resolution: Customer was advised that customer would be contacted by the insurance company.
If you specify the --output FILE
optional argument, a JSON version of the results are written to the file. The file output is a combination of the JSON responses from the batch transcription (Speech), sentiment (Language), and conversation summarization (Language) APIs.
The transcription
property contains a JSON object with the results of sentiment analysis merged with batch transcription. Here's an example, with redactions for brevity:
{
"source": "https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/scenarios/call-center/sampledata/Call1_separated_16k_health_insurance.wav",
// Example results redacted for brevity
"nBest": [
{
"confidence": 0.77464247,
"lexical": "hello thank you for calling contoso who am i speaking with today",
"itn": "hello thank you for calling contoso who am i speaking with today",
"maskedITN": "hello thank you for calling contoso who am i speaking with today",
"display": "Hello, thank you for calling Contoso. Who am I speaking with today?",
"sentiment": {
"positive": 0.78,
"neutral": 0.21,
"negative": 0.01
}
},
]
// Example results redacted for brevity
}
The conversationAnalyticsResults
property contains a JSON object with the results of the conversation PII and conversation summarization analysis. Here's an example, with redactions for brevity:
{
"conversationAnalyticsResults": {
"conversationSummaryResults": {
"conversations": [
{
"id": "conversation1",
"summaries": [
{
"aspect": "issue",
"text": "Customer wants to sign up for insurance"
},
{
"aspect": "resolution",
"text": "Customer was advised that customer would be contacted by the insurance company"
}
],
"warnings": []
}
],
"errors": [],
"modelVersion": "2022-05-15-preview"
},
"conversationPiiResults": {
"combinedRedactedContent": [
{
"channel": "0",
"display": "Hello, thank you for calling Contoso. Who am I speaking with today? Hi, ****. Uh, are you calling because you need health insurance?", // Example results redacted for brevity
"itn": "hello thank you for calling contoso who am i speaking with today hi **** uh are you calling because you need health insurance", // Example results redacted for brevity
"lexical": "hello thank you for calling contoso who am i speaking with today hi **** uh are you calling because you need health insurance" // Example results redacted for brevity
},
{
"channel": "1",
"display": "Hi, my name is **********. I'm trying to enroll myself with Contoso. Yes. Yeah, I'm calling to sign up for insurance.", // Example results redacted for brevity
"itn": "hi my name is ********** i'm trying to enroll myself with contoso yes yeah i'm calling to sign up for insurance", // Example results redacted for brevity
"lexical": "hi my name is ********** i'm trying to enroll myself with contoso yes yeah i'm calling to sign up for insurance" // Example results redacted for brevity
}
],
"conversations": [
{
"id": "conversation1",
"conversationItems": [
{
"id": "0",
"redactedContent": {
"itn": "hello thank you for calling contoso who am i speaking with today",
"lexical": "hello thank you for calling contoso who am i speaking with today",
"text": "Hello, thank you for calling Contoso. Who am I speaking with today?"
},
"entities": [],
"channel": "0",
"offset": "PT0.77S"
},
{
"id": "1",
"redactedContent": {
"itn": "hi my name is ********** i'm trying to enroll myself with contoso",
"lexical": "hi my name is ********** i'm trying to enroll myself with contoso",
"text": "Hi, my name is **********. I'm trying to enroll myself with Contoso."
},
"entities": [
{
"text": "Mary Rondo",
"category": "Name",
"offset": 15,
"length": 10,
"confidenceScore": 0.97
}
],
"channel": "1",
"offset": "PT4.55S"
},
{
"id": "2",
"redactedContent": {
"itn": "hi **** uh are you calling because you need health insurance",
"lexical": "hi **** uh are you calling because you need health insurance",
"text": "Hi, ****. Uh, are you calling because you need health insurance?"
},
"entities": [
{
"text": "Mary",
"category": "Name",
"offset": 4,
"length": 4,
"confidenceScore": 0.93
}
],
"channel": "0",
"offset": "PT9.55S"
},
{
"id": "3",
"redactedContent": {
"itn": "yes yeah i'm calling to sign up for insurance",
"lexical": "yes yeah i'm calling to sign up for insurance",
"text": "Yes. Yeah, I'm calling to sign up for insurance."
},
"entities": [],
"channel": "1",
"offset": "PT13.09S"
},
// Example results redacted for brevity
],
"warnings": []
}
]
}
}
}
Usage and arguments
Usage: call-center -- [...]
Important
You can use a multi-service resource or separate Language and Speech resources. In either case, the --languageKey
and --languageEndpoint
values must correspond to a resource that's in one of the regions supported by the conversation summarization API: eastus
, northeurope
, and uksouth
.
Connection options include:
--speechKey KEY
: Your Azure AI services or Speech resource key. Required for audio transcriptions with the--input
from URL option.--speechRegion REGION
: Your Azure AI services or Speech resource region. Required for audio transcriptions with the--input
from URL option. Examples:eastus
,northeurope
--languageKey KEY
: Your Azure AI services or Language resource key. Required.--languageEndpoint ENDPOINT
: Your Azure AI services or Language resource endpoint. Required. Example:https://YourResourceName.cognitiveservices.azure.com
Input options include:
--input URL
: Input audio from URL. You must set either the--input
or--jsonInput
option.--jsonInput FILE
: Input an existing batch transcription JSON result from FILE. With this option, you only need a Language resource to process a transcription that you already have. With this option, you don't need an audio file or a Speech resource. Overrides--input
. You must set either the--input
or--jsonInput
option.--stereo
: Indicates that the audio via ```input URL` should be in stereo format. If stereo isn't specified, then mono 16khz 16 bit PCM wav files are assumed. Diarization of mono files is used to separate multiple speakers. Diarization of stereo files isn't supported, since 2-channel stereo files should already have one speaker per channel.--certificate
: The PEM certificate file. Required for C++.
Language options include:
--language LANGUAGE
: The language to use for sentiment analysis and conversation analysis. This value should be a two-letter ISO 639-1 code. The default value isen
.--locale LOCALE
: The locale to use for batch transcription of audio. The default value isen-US
.
Output options include:
--help
: Show the usage help and stop--output FILE
: Output the transcription, sentiment, conversation PII, and conversation summaries in JSON format to a text file. For more information, see output examples.
Clean up resources
You can use the Azure portal or Azure Command Line Interface (CLI) to remove the Azure AI services resource you created.