Use Speech service containers with Kubernetes and Helm
One option to manage your Speech containers on-premises is to use Kubernetes and Helm. Using Kubernetes and Helm to define the speech to text and text to speech container images, we create a Kubernetes package. This package is deployed to a Kubernetes cluster on-premises. Finally, we explore how to test the deployed services and various configuration options. For more information about running Docker containers without Kubernetes orchestration, see install and run Speech service containers.
Prerequisites
The following prerequisites before using Speech containers on-premises:
Required | Purpose |
---|---|
Azure Account | If you don't have an Azure subscription, create a free account before you begin. |
Container Registry access | In order for Kubernetes to pull the docker images into the cluster, it needs access to the container registry. |
Kubernetes CLI | The Kubernetes CLI is required for managing the shared credentials from the container registry. Kubernetes is also needed before Helm, which is the Kubernetes package manager. |
Helm CLI | Install the Helm CLI, which is used to install a helm chart (container package definition). |
Speech resource | In order to use these containers, you must have: A Speech Azure resource to get the associated billing key and billing endpoint URI. Both values are available on the Azure portal's Speech Overview and Keys pages and are required to start the container. {API_KEY}: resource key {ENDPOINT_URI}: endpoint URI example is: https://eastus.api.cognitive.microsoft.com/sts/v1.0 |
The recommended host computer configuration
Refer to the Speech service container host computer details as a reference. This helm chart automatically calculates CPU and memory requirements based on how many decodes (concurrent requests) that the user specifies. Additionally, it adjusts based on whether optimizations for audio/text input are configured as enabled
. The helm chart defaults to, two concurrent requests and disabling optimization.
Service | CPU / Container | Memory / Container |
---|---|---|
speech to text | one decoder requires a minimum of 1,150 millicores. If the optimizedForAudioFile is enabled, then 1,950 millicores are required. (default: two decoders) |
Required: 2 GB Limited: 4 GB |
text to speech | one concurrent request requires a minimum of 500 millicores. If the optimizeForTurboMode is enabled, then 1,000 millicores are required. (default: two concurrent requests) |
Required: 1 GB Limited: 2 GB |
Connect to the Kubernetes cluster
The host computer is expected to have an available Kubernetes cluster. See this tutorial on deploying a Kubernetes cluster for a conceptual understanding of how to deploy a Kubernetes cluster to a host computer.
Configure Helm chart values for deployment
Visit the Microsoft Helm Hub for all the publicly available helm charts offered by Microsoft. From the Microsoft Helm Hub, you find the Azure AI Speech On-Premises Chart. The Azure AI Speech On-Premises is the chart we install, but we must first create an config-values.yaml
file with explicit configurations. Let's start by adding the Microsoft repository to our Helm instance.
helm repo add microsoft https://microsoft.github.io/charts/repo
Next, we configure our Helm chart values. Copy and paste the following YAML into a file named config-values.yaml
. For more information on customizing the Azure AI Speech On-Premises Helm Chart, see customize helm charts. Replace the # {ENDPOINT_URI}
and # {API_KEY}
comments with your own values.
# These settings are deployment specific and users can provide customizations
# speech to text configurations
speechToText:
enabled: true
numberOfConcurrentRequest: 3
optimizeForAudioFile: true
image:
registry: mcr.microsoft.com
repository: azure-cognitive-services/speechservices/speech-to-text
tag: latest
pullSecrets:
- mcr # Or an existing secret
args:
eula: accept
billing: # {ENDPOINT_URI}
apikey: # {API_KEY}
# text to speech configurations
textToSpeech:
enabled: true
numberOfConcurrentRequest: 3
optimizeForTurboMode: true
image:
registry: mcr.microsoft.com
repository: azure-cognitive-services/speechservices/neural-text-to-speech
tag: latest
pullSecrets:
- mcr # Or an existing secret
args:
eula: accept
billing: # {ENDPOINT_URI}
apikey: # {API_KEY}
Important
If the billing
and apikey
values are not provided, the services will expire after 15 min. Likewise, verification will fail as the services will not be available.
The Kubernetes package (Helm chart)
The Helm chart contains the configuration of which docker image(s) to pull from the mcr.microsoft.com
container registry.
A Helm chart is a collection of files that describe a related set of Kubernetes resources. A single chart might be used to deploy something simple, like a memcached pod, or something complex, like a full web app stack with HTTP servers, databases, caches, and so on.
The provided Helm charts pull the docker images of the Speech service, both text to speech and the speech to text services from the mcr.microsoft.com
container registry.
Install the Helm chart on the Kubernetes cluster
Run the helm install
command to install the helm chart, replacing the <config-values.yaml>
with the appropriate path and file name argument. The microsoft/cognitive-services-speech-onpremise
Helm chart is available on the Microsoft Helm Hub.
helm install onprem-speech microsoft/cognitive-services-speech-onpremise \
--version 0.1.1 \
--values <config-values.yaml>
Here's an example output you might expect to see from a successful install execution:
NAME: onprem-speech
LAST DEPLOYED: Tue Jul 2 12:51:42 2019
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Pod(related)
NAME READY STATUS RESTARTS AGE
speech-to-text-7664f5f465-87w2d 0/1 Pending 0 0s
speech-to-text-7664f5f465-klbr8 0/1 ContainerCreating 0 0s
neural-text-to-speech-56f8fb685b-4jtzh 0/1 ContainerCreating 0 0s
neural-text-to-speech-56f8fb685b-frwxf 0/1 Pending 0 0s
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
speech-to-text LoadBalancer 10.0.252.106 <pending> 80:31811/TCP 1s
neural-text-to-speech LoadBalancer 10.0.125.187 <pending> 80:31247/TCP 0s
==> v1beta1/PodDisruptionBudget
NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
speech-to-text-poddisruptionbudget N/A 20% 0 1s
neural-text-to-speech-poddisruptionbudget N/A 20% 0 1s
==> v1beta2/Deployment
NAME READY UP-TO-DATE AVAILABLE AGE
speech-to-text 0/2 2 0 0s
neural-text-to-speech 0/2 2 0 0s
==> v2beta2/HorizontalPodAutoscaler
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
speech-to-text-autoscaler Deployment/speech-to-text <unknown>/50% 2 10 0 0s
neural-text-to-speech-autoscaler Deployment/neural-text-to-speech <unknown>/50% 2 10 0 0s
NOTES:
cognitive-services-speech-onpremise has been installed!
Release is named onprem-speech
The Kubernetes deployment can take over several minutes to complete. To confirm that both pods and services are properly deployed and available, execute the following command:
kubectl get all
You should expect to see something similar to the following output:
NAME READY STATUS RESTARTS AGE
pod/speech-to-text-7664f5f465-87w2d 1/1 Running 0 34m
pod/speech-to-text-7664f5f465-klbr8 1/1 Running 0 34m
pod/neural-text-to-speech-56f8fb685b-4jtzh 1/1 Running 0 34m
pod/neural-text-to-speech-56f8fb685b-frwxf 1/1 Running 0 34m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 3h
service/speech-to-text LoadBalancer 10.0.252.106 52.162.123.151 80:31811/TCP 34m
service/neural-text-to-speech LoadBalancer 10.0.125.187 65.52.233.162 80:31247/TCP 34m
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/speech-to-text 2 2 2 2 34m
deployment.apps/neural-text-to-speech 2 2 2 2 34m
NAME DESIRED CURRENT READY AGE
replicaset.apps/speech-to-text-7664f5f465 2 2 2 34m
replicaset.apps/neural-text-to-speech-56f8fb685b 2 2 2 34m
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/speech-to-text-autoscaler Deployment/speech-to-text 1%/50% 2 10 2 34m
horizontalpodautoscaler.autoscaling/neural-text-to-speech-autoscaler Deployment/neural-text-to-speech 0%/50% 2 10 2 34m
Verify Helm deployment with Helm tests
The installed Helm charts define Helm tests, which serve as a convenience for verification. These tests validate service readiness. To verify both speech to text and text to speech features, we execute the Helm test command.
helm test onprem-speech
Important
These tests will fail if the POD status is not Running
or if the deployment is not listed under the AVAILABLE
column. Be patient as this can take over ten minutes to complete.
These tests output various status results:
RUNNING: speech to text-readiness-test
PASSED: speech to text-readiness-test
RUNNING: text to speech-readiness-test
PASSED: text to speech-readiness-test
As an alternative to executing the helm tests, you could collect the External IP addresses and corresponding ports from the kubectl get all
command. Using the IP and port, open a web browser and navigate to http://<external-ip>:<port>:/swagger/index.html
to view the API swagger page(s).
Customize Helm charts
Helm charts are hierarchical. Being hierarchical allows for chart inheritance, it also caters to the concept of specificity, where settings that are more specific override inherited rules.
Speech (umbrella chart)
Values in the top-level "umbrella" chart override the corresponding sub-chart values. Therefore, all on-premises customized values should be added here.
Parameter | Description | Default |
---|---|---|
speechToText.enabled |
Whether the speech to text service is enabled. | true |
speechToText.verification.enabled |
Whether the helm test capability for speech to text service is enabled. |
true |
speechToText.verification.image.registry |
The docker image repository that helm test uses to test speech to text service. Helm creates separate pod inside the cluster for testing and pulls the test-use image from this registry. |
docker.io |
speechToText.verification.image.repository |
The docker image repository that helm test uses to test speech to text service. Helm test pod uses this repository to pull test-use image. |
antsu/on-prem-client |
speechToText.verification.image.tag |
The docker image tag used with helm test for speech to text service. Helm test pod uses this tag to pull test-use image. |
latest |
speechToText.verification.image.pullByHash |
Whether the test-use docker image is pulled by hash. If true , speechToText.verification.image.hash should be added, with valid image hash value. |
false |
speechToText.verification.image.arguments |
The arguments used to execute the test-use docker image. Helm test pod passes these arguments to the container when running helm test . |
"./speech-to-text-client" "./audio/whatstheweatherlike.wav" "--expect=What's the weather like" "--host=$(SPEECH_TO_TEXT_HOST)" "--port=$(SPEECH_TO_TEXT_PORT)" |
textToSpeech.enabled |
Whether the text to speech service is enabled. | true |
textToSpeech.verification.enabled |
Whether the helm test capability for speech to text service is enabled. |
true |
textToSpeech.verification.image.registry |
The docker image repository that helm test uses to test speech to text service. Helm creates separate pod inside the cluster for testing and pulls the test-use image from this registry. |
docker.io |
textToSpeech.verification.image.repository |
The docker image repository that helm test uses to test speech to text service. Helm test pod uses this repository to pull test-use image. |
antsu/on-prem-client |
textToSpeech.verification.image.tag |
The docker image tag used with helm test for speech to text service. Helm test pod uses this tag to pull test-use image. |
latest |
textToSpeech.verification.image.pullByHash |
Whether the test-use docker image is pulled by hash. If true , textToSpeech.verification.image.hash should be added, with valid image hash value. |
false |
textToSpeech.verification.image.arguments |
The arguments to execute with the test-use docker image. The helm test pod passes these arguments to container when running helm test . |
"./text-to-speech-client" "--input='What's the weather like'" "--host=$(TEXT_TO_SPEECH_HOST)" "--port=$(TEXT_TO_SPEECH_PORT)" |
Speech to text (sub-chart: charts/speechToText)
To override the "umbrella" chart, add the prefix speechToText.
on any parameter to make it more specific. For example, it will override the corresponding parameter for example, speechToText.numberOfConcurrentRequest
overrides numberOfConcurrentRequest
.
Parameter | Description | Default |
---|---|---|
enabled |
Whether the speech to text service is enabled. | false |
numberOfConcurrentRequest |
The number of concurrent requests for the speech to text service. This chart automatically calculates CPU and memory resources, based on this value. | 2 |
optimizeForAudioFile |
Whether the service needs to optimize for audio input via audio files. If true , this chart will allocate more CPU resource to service. |
false |
image.registry |
The speech to text docker image registry. | containerpreview.azurecr.io |
image.repository |
The speech to text docker image repository. | microsoft/cognitive-services-speech-to-text |
image.tag |
The speech to text docker image tag. | latest |
image.pullSecrets |
The image secrets for pulling the speech to text docker image. | |
image.pullByHash |
Whether the docker image is pulled by hash. If true , image.hash is required. |
false |
image.hash |
The speech to text docker image hash. Only used when image.pullByHash: true . |
|
image.args.eula (required) |
Indicates you've accepted the license. The only valid value is accept |
|
image.args.billing (required) |
The billing endpoint URI value is available on the Azure portal's Speech Overview page. | |
image.args.apikey (required) |
Used to track billing information. | |
service.type |
The Kubernetes service type of the speech to text service. See the Kubernetes service types instructions for more details and verify cloud provider support. | LoadBalancer |
service.port |
The port of the speech to text service. | 80 |
service.annotations |
The speech to text annotations for the service metadata. Annotations are key value pairs. annotations: some/annotation1: value1 some/annotation2: value2 |
|
service.autoScaler.enabled |
Whether the Horizontal Pod Autoscaler is enabled. If true , the speech-to-text-autoscaler will be deployed in the Kubernetes cluster. |
true |
service.podDisruption.enabled |
Whether the Pod Disruption Budget is enabled. If true , the speech-to-text-poddisruptionbudget will be deployed in the Kubernetes cluster. |
true |
Sentiment analysis (sub-chart: charts/speechToText)
Starting with v2.2.0 of the speech to text container and v0.2.0 of the Helm chart, the following parameters are used for sentiment analysis using the Language service API.
Parameter | Description | Values | Default |
---|---|---|---|
textanalytics.enabled |
Whether the text-analytics service is enabled | true/false | false |
textanalytics.image.registry |
The text-analytics docker image registry | valid docker image registry | |
textanalytics.image.repository |
The text-analytics docker image repository | valid docker image repository | |
textanalytics.image.tag |
The text-analytics docker image tag | valid docker image tag | |
textanalytics.image.pullSecrets |
The image secrets for pulling text-analytics docker image | valid secrets name | |
textanalytics.image.pullByHash |
Specifies if you are pulling docker image by hash. If yes , image.hash is required to have as well. If no , set it as 'false'. Default is false . |
true/false | false |
textanalytics.image.hash |
The text-analytics docker image hash. Only use it with image.pullByHash:true . |
valid docker image hash | |
textanalytics.image.args.eula |
One of the required arguments by text-analytics container, which indicates you've accepted the license. The value of this option must be: accept . |
accept , if you want to use the container |
|
textanalytics.image.args.billing |
One of the required arguments by text-analytics container, which specifies the billing endpoint URI. The billing endpoint URI value is available on the Azure portal's Speech Overview page. | valid billing endpoint URI | |
textanalytics.image.args.apikey |
One of the required arguments by text-analytics container, which is used to track billing information. | valid apikey | |
textanalytics.cpuRequest |
The requested CPU for text-analytics container | int | 3000m |
textanalytics.cpuLimit |
The limited CPU for text-analytics container | 8000m |
|
textanalytics.memoryRequest |
The requested memory for text-analytics container | 3Gi |
|
textanalytics.memoryLimit |
The limited memory for text-analytics container | 8Gi |
|
textanalytics.service.sentimentURISuffix |
The sentiment analysis URI suffix, the whole URI is in format "http://<service> :<port> /<sentimentURISuffix> ". |
text/analytics/v3.0-preview/sentiment |
|
textanalytics.service.type |
The type of text-analytics service in Kubernetes. See Kubernetes service types | valid Kubernetes service type | LoadBalancer |
textanalytics.service.port |
The port of the text-analytics service | int | 50085 |
textanalytics.service.annotations |
The annotations users can add to text-analytics service metadata. For instance: annotations: some/annotation1: value1 some/annotation2: value2 |
annotations, one per each line | |
textanalytics.serivce.autoScaler.enabled |
Whether Horizontal Pod Autoscaler is enabled. If enabled, text-analytics-autoscaler will be deployed in the Kubernetes cluster |
true/false | true |
textanalytics.service.podDisruption.enabled |
Whether Pod Disruption Budget is enabled. If enabled, text-analytics-poddisruptionbudget will be deployed in the Kubernetes cluster |
true/false | true |
Text to speech (sub-chart: charts/textToSpeech)
To override the "umbrella" chart, add the prefix textToSpeech.
on any parameter to make it more specific. For example, it will override the corresponding parameter for example, textToSpeech.numberOfConcurrentRequest
overrides numberOfConcurrentRequest
.
Parameter | Description | Default |
---|---|---|
enabled |
Whether the text to speech service is enabled. | false |
numberOfConcurrentRequest |
The number of concurrent requests for the text to speech service. This chart automatically calculates CPU and memory resources, based on this value. | 2 |
optimizeForTurboMode |
Whether the service needs to optimize for text input via text files. If true , this chart will allocate more CPU resource to service. |
false |
image.registry |
The text to speech docker image registry. | containerpreview.azurecr.io |
image.repository |
The text to speech docker image repository. | microsoft/cognitive-services-text-to-speech |
image.tag |
The text to speech docker image tag. | latest |
image.pullSecrets |
The image secrets for pulling the text to speech docker image. | |
image.pullByHash |
Whether the docker image is pulled by hash. If true , image.hash is required. |
false |
image.hash |
The text to speech docker image hash. Only used when image.pullByHash: true . |
|
image.args.eula (required) |
Indicates you've accepted the license. The only valid value is accept |
|
image.args.billing (required) |
The billing endpoint URI value is available on the Azure portal's Speech Overview page. | |
image.args.apikey (required) |
Used to track billing information. | |
service.type |
The Kubernetes service type of the text to speech service. See the Kubernetes service types instructions for more details and verify cloud provider support. | LoadBalancer |
service.port |
The port of the text to speech service. | 80 |
service.annotations |
The text to speech annotations for the service metadata. Annotations are key value pairs. annotations: some/annotation1: value1 some/annotation2: value2 |
|
service.autoScaler.enabled |
Whether the Horizontal Pod Autoscaler is enabled. If true , the text-to-speech-autoscaler will be deployed in the Kubernetes cluster. |
true |
service.podDisruption.enabled |
Whether the Pod Disruption Budget is enabled. If true , the text-to-speech-poddisruptionbudget will be deployed in the Kubernetes cluster. |
true |
Next steps
For more details on installing applications with Helm in Azure Kubernetes Service (AKS), visit here.