Ingestion Client with Azure Cognitive Services
The Ingestion Client is a tool released by Microsoft on GitHub that helps you quickly deploy a call center transcription solution to Azure with a no-code approach.
You can use the tool and resulting solution in production to process a high volume of audio.
Ingestion Client uses the Azure Cognitive Service for Language, Azure Cognitive Service for Speech, Azure storage, and Azure Functions.
Get started with the Ingestion Client
An Azure Account and an Azure Cognitive Services resource are needed to run the Ingestion Client.
- Azure subscription - Create one for free
- Create a Cognitive Services resource in the Azure portal.
- Get the resource key and region. After your Cognitive Services resource is deployed, select Go to resource to view and manage keys. For more information about Cognitive Services resources, see Get the keys for your resource.
See the Getting Started Guide for the Ingestion Client on GitHub to learn how to setup and use the tool.
Ingestion Client Features
The Ingestion Client works by connecting a dedicated Azure storage account to custom Azure Functions in a serverless fashion to pass transcription requests to the service. The transcribed audio files land in the dedicated Azure Storage container.
Pricing varies depending on the mode of operation (batch vs real time) as well as the Azure Function SKU selected. By default the tool will create a Premium Azure Function SKU to handle large volume. Visit the Pricing page for more information.
Internally, the tool uses Speech and Language services, and follows best practices to handle scale-up, retries and failover. The following schematic describes the resources and connections.
The following Speech service feature is used by the Ingestion Client:
- Batch speech-to-text: Transcribe large amounts of audio files asynchronously including speaker diarization and is typically used in post-call analytics scenarios. Diarization is the process of recognizing and separating speakers in mono channel audio data.
Here are some Language service features that are used by the Ingestion Client:
- Personally Identifiable Information (PII) extraction and redaction: Identify, categorize, and redact sensitive information in conversation transcription.
- Sentiment analysis and opinion mining: Analyze transcriptions and associate positive, neutral, or negative sentiment at the utterance and conversation-level.
Besides Cognitive Services, these Azure products are used to complete the solution:
- Azure storage: For storing telephony data and the transcripts that are returned by the Batch Transcription API. This storage account should use notifications, specifically for when new files are added. These notifications are used to trigger the transcription process.
- Azure Functions: For creating the shared access signature (SAS) URI for each recording, and triggering the HTTP POST request to start a transcription. Additionally, you use Azure Functions to create requests to retrieve and delete transcriptions by using the Batch Transcription API.
The tool is built to show customers results quickly. You can customize the tool to your preferred SKUs and setup. The SKUs can be edited from the Azure portal and the code itself is available on GitHub.
We suggest creating the resources in the same dedicated resource group to understand and track costs more easily.
Submit and view feedback for