Use Speech service containers with Kubernetes and Helm

One option to manage your Speech containers on-premises is to use Kubernetes and Helm. Using Kubernetes and Helm to define the speech to text and text to speech container images, we create a Kubernetes package. This package is deployed to a Kubernetes cluster on-premises. Finally, we explore how to test the deployed services and various configuration options. For more information about running Docker containers without Kubernetes orchestration, see install and run Speech service containers.

Prerequisites

The following prerequisites before using Speech containers on-premises:

Required Purpose
Azure Account If you don't have an Azure subscription, create a free account before you begin.
Container Registry access In order for Kubernetes to pull the docker images into the cluster, it needs access to the container registry.
Kubernetes CLI The Kubernetes CLI is required for managing the shared credentials from the container registry. Kubernetes is also needed before Helm, which is the Kubernetes package manager.
Helm CLI Install the Helm CLI, which is used to install a helm chart (container package definition).
Speech resource In order to use these containers, you must have:

A Speech Azure resource to get the associated billing key and billing endpoint URI. Both values are available on the Azure portal's Speech Overview and Keys pages and are required to start the container.

{API_KEY}: resource key

{ENDPOINT_URI}: endpoint URI example is: https://eastus.api.cognitive.microsoft.com/sts/v1.0

Refer to the Speech service container host computer details as a reference. This helm chart automatically calculates CPU and memory requirements based on how many decodes (concurrent requests) that the user specifies. Additionally, it adjusts based on whether optimizations for audio/text input are configured as enabled. The helm chart defaults to, two concurrent requests and disabling optimization.

Service CPU / Container Memory / Container
speech to text one decoder requires a minimum of 1,150 millicores. If the optimizedForAudioFile is enabled, then 1,950 millicores are required. (default: two decoders) Required: 2 GB
Limited: 4 GB
text to speech one concurrent request requires a minimum of 500 millicores. If the optimizeForTurboMode is enabled, then 1,000 millicores are required. (default: two concurrent requests) Required: 1 GB
Limited: 2 GB

Connect to the Kubernetes cluster

The host computer is expected to have an available Kubernetes cluster. See this tutorial on deploying a Kubernetes cluster for a conceptual understanding of how to deploy a Kubernetes cluster to a host computer.

Configure Helm chart values for deployment

Visit the Microsoft Helm Hub for all the publicly available helm charts offered by Microsoft. From the Microsoft Helm Hub, you find the Azure AI Speech On-Premises Chart. The Azure AI Speech On-Premises is the chart we install, but we must first create an config-values.yaml file with explicit configurations. Let's start by adding the Microsoft repository to our Helm instance.

helm repo add microsoft https://microsoft.github.io/charts/repo

Next, we configure our Helm chart values. Copy and paste the following YAML into a file named config-values.yaml. For more information on customizing the Azure AI Speech On-Premises Helm Chart, see customize helm charts. Replace the # {ENDPOINT_URI} and # {API_KEY} comments with your own values.

# These settings are deployment specific and users can provide customizations
# speech to text configurations
speechToText:
  enabled: true
  numberOfConcurrentRequest: 3
  optimizeForAudioFile: true
  image:
    registry: mcr.microsoft.com
    repository: azure-cognitive-services/speechservices/speech-to-text
    tag: latest
    pullSecrets:
      - mcr # Or an existing secret
    args:
      eula: accept
      billing: # {ENDPOINT_URI}
      apikey: # {API_KEY}

# text to speech configurations
textToSpeech:
  enabled: true
  numberOfConcurrentRequest: 3
  optimizeForTurboMode: true
  image:
    registry: mcr.microsoft.com
    repository: azure-cognitive-services/speechservices/neural-text-to-speech
    tag: latest
    pullSecrets:
      - mcr # Or an existing secret
    args:
      eula: accept
      billing: # {ENDPOINT_URI}
      apikey: # {API_KEY}

Important

If the billing and apikey values are not provided, the services will expire after 15 min. Likewise, verification will fail as the services will not be available.

The Kubernetes package (Helm chart)

The Helm chart contains the configuration of which docker image(s) to pull from the mcr.microsoft.com container registry.

A Helm chart is a collection of files that describe a related set of Kubernetes resources. A single chart might be used to deploy something simple, like a memcached pod, or something complex, like a full web app stack with HTTP servers, databases, caches, and so on.

The provided Helm charts pull the docker images of the Speech service, both text to speech and the speech to text services from the mcr.microsoft.com container registry.

Install the Helm chart on the Kubernetes cluster

Run the helm install command to install the helm chart, replacing the <config-values.yaml> with the appropriate path and file name argument. The microsoft/cognitive-services-speech-onpremise Helm chart is available on the Microsoft Helm Hub.

helm install onprem-speech microsoft/cognitive-services-speech-onpremise \
    --version 0.1.1 \
    --values <config-values.yaml> 

Here's an example output you might expect to see from a successful install execution:

NAME:   onprem-speech
LAST DEPLOYED: Tue Jul  2 12:51:42 2019
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Pod(related)
NAME                             READY  STATUS             RESTARTS  AGE
speech-to-text-7664f5f465-87w2d  0/1    Pending            0         0s
speech-to-text-7664f5f465-klbr8  0/1    ContainerCreating  0         0s
neural-text-to-speech-56f8fb685b-4jtzh  0/1    ContainerCreating  0         0s
neural-text-to-speech-56f8fb685b-frwxf  0/1    Pending            0         0s

==> v1/Service
NAME            TYPE          CLUSTER-IP    EXTERNAL-IP  PORT(S)       AGE
speech-to-text  LoadBalancer  10.0.252.106  <pending>    80:31811/TCP  1s
neural-text-to-speech  LoadBalancer  10.0.125.187  <pending>    80:31247/TCP  0s

==> v1beta1/PodDisruptionBudget
NAME                                MIN AVAILABLE  MAX UNAVAILABLE  ALLOWED DISRUPTIONS  AGE
speech-to-text-poddisruptionbudget  N/A            20%              0                    1s
neural-text-to-speech-poddisruptionbudget  N/A            20%              0                    1s

==> v1beta2/Deployment
NAME            READY  UP-TO-DATE  AVAILABLE  AGE
speech-to-text  0/2    2           0          0s
neural-text-to-speech  0/2    2           0          0s

==> v2beta2/HorizontalPodAutoscaler
NAME                       REFERENCE                  TARGETS        MINPODS  MAXPODS  REPLICAS  AGE
speech-to-text-autoscaler  Deployment/speech-to-text  <unknown>/50%  2        10       0         0s
neural-text-to-speech-autoscaler  Deployment/neural-text-to-speech  <unknown>/50%  2        10       0         0s


NOTES:
cognitive-services-speech-onpremise has been installed!
Release is named onprem-speech

The Kubernetes deployment can take over several minutes to complete. To confirm that both pods and services are properly deployed and available, execute the following command:

kubectl get all

You should expect to see something similar to the following output:

NAME                                  READY     STATUS    RESTARTS   AGE
pod/speech-to-text-7664f5f465-87w2d   1/1       Running   0          34m
pod/speech-to-text-7664f5f465-klbr8   1/1       Running   0          34m
pod/neural-text-to-speech-56f8fb685b-4jtzh   1/1       Running   0          34m
pod/neural-text-to-speech-56f8fb685b-frwxf   1/1       Running   0          34m

NAME                     TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
service/kubernetes       ClusterIP      10.0.0.1       <none>           443/TCP        3h
service/speech-to-text   LoadBalancer   10.0.252.106   52.162.123.151   80:31811/TCP   34m
service/neural-text-to-speech   LoadBalancer   10.0.125.187   65.52.233.162    80:31247/TCP   34m

NAME                             DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/speech-to-text   2         2         2            2           34m
deployment.apps/neural-text-to-speech   2         2         2            2           34m

NAME                                        DESIRED   CURRENT   READY     AGE
replicaset.apps/speech-to-text-7664f5f465   2         2         2         34m
replicaset.apps/neural-text-to-speech-56f8fb685b   2         2         2         34m

NAME                                                            REFERENCE                   TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/speech-to-text-autoscaler   Deployment/speech-to-text   1%/50%    2         10        2          34m
horizontalpodautoscaler.autoscaling/neural-text-to-speech-autoscaler   Deployment/neural-text-to-speech   0%/50%    2         10        2          34m

Verify Helm deployment with Helm tests

The installed Helm charts define Helm tests, which serve as a convenience for verification. These tests validate service readiness. To verify both speech to text and text to speech features, we execute the Helm test command.

helm test onprem-speech

Important

These tests will fail if the POD status is not Running or if the deployment is not listed under the AVAILABLE column. Be patient as this can take over ten minutes to complete.

These tests output various status results:

RUNNING: speech to text-readiness-test
PASSED: speech to text-readiness-test
RUNNING: text to speech-readiness-test
PASSED: text to speech-readiness-test

As an alternative to executing the helm tests, you could collect the External IP addresses and corresponding ports from the kubectl get all command. Using the IP and port, open a web browser and navigate to http://<external-ip>:<port>:/swagger/index.html to view the API swagger page(s).

Customize Helm charts

Helm charts are hierarchical. Being hierarchical allows for chart inheritance, it also caters to the concept of specificity, where settings that are more specific override inherited rules.

Speech (umbrella chart)

Values in the top-level "umbrella" chart override the corresponding sub-chart values. Therefore, all on-premises customized values should be added here.

Parameter Description Default
speechToText.enabled Whether the speech to text service is enabled. true
speechToText.verification.enabled Whether the helm test capability for speech to text service is enabled. true
speechToText.verification.image.registry The docker image repository that helm test uses to test speech to text service. Helm creates separate pod inside the cluster for testing and pulls the test-use image from this registry. docker.io
speechToText.verification.image.repository The docker image repository that helm test uses to test speech to text service. Helm test pod uses this repository to pull test-use image. antsu/on-prem-client
speechToText.verification.image.tag The docker image tag used with helm test for speech to text service. Helm test pod uses this tag to pull test-use image. latest
speechToText.verification.image.pullByHash Whether the test-use docker image is pulled by hash. If true, speechToText.verification.image.hash should be added, with valid image hash value. false
speechToText.verification.image.arguments The arguments used to execute the test-use docker image. Helm test pod passes these arguments to the container when running helm test. "./speech-to-text-client"
"./audio/whatstheweatherlike.wav"
"--expect=What's the weather like"
"--host=$(SPEECH_TO_TEXT_HOST)"
"--port=$(SPEECH_TO_TEXT_PORT)"
textToSpeech.enabled Whether the text to speech service is enabled. true
textToSpeech.verification.enabled Whether the helm test capability for speech to text service is enabled. true
textToSpeech.verification.image.registry The docker image repository that helm test uses to test speech to text service. Helm creates separate pod inside the cluster for testing and pulls the test-use image from this registry. docker.io
textToSpeech.verification.image.repository The docker image repository that helm test uses to test speech to text service. Helm test pod uses this repository to pull test-use image. antsu/on-prem-client
textToSpeech.verification.image.tag The docker image tag used with helm test for speech to text service. Helm test pod uses this tag to pull test-use image. latest
textToSpeech.verification.image.pullByHash Whether the test-use docker image is pulled by hash. If true, textToSpeech.verification.image.hash should be added, with valid image hash value. false
textToSpeech.verification.image.arguments The arguments to execute with the test-use docker image. The helm test pod passes these arguments to container when running helm test. "./text-to-speech-client"
"--input='What's the weather like'"
"--host=$(TEXT_TO_SPEECH_HOST)"
"--port=$(TEXT_TO_SPEECH_PORT)"

Speech to text (sub-chart: charts/speechToText)

To override the "umbrella" chart, add the prefix speechToText. on any parameter to make it more specific. For example, it will override the corresponding parameter for example, speechToText.numberOfConcurrentRequest overrides numberOfConcurrentRequest.

Parameter Description Default
enabled Whether the speech to text service is enabled. false
numberOfConcurrentRequest The number of concurrent requests for the speech to text service. This chart automatically calculates CPU and memory resources, based on this value. 2
optimizeForAudioFile Whether the service needs to optimize for audio input via audio files. If true, this chart will allocate more CPU resource to service. false
image.registry The speech to text docker image registry. containerpreview.azurecr.io
image.repository The speech to text docker image repository. microsoft/cognitive-services-speech-to-text
image.tag The speech to text docker image tag. latest
image.pullSecrets The image secrets for pulling the speech to text docker image.
image.pullByHash Whether the docker image is pulled by hash. If true, image.hash is required. false
image.hash The speech to text docker image hash. Only used when image.pullByHash: true.
image.args.eula (required) Indicates you've accepted the license. The only valid value is accept
image.args.billing (required) The billing endpoint URI value is available on the Azure portal's Speech Overview page.
image.args.apikey (required) Used to track billing information.
service.type The Kubernetes service type of the speech to text service. See the Kubernetes service types instructions for more details and verify cloud provider support. LoadBalancer
service.port The port of the speech to text service. 80
service.annotations The speech to text annotations for the service metadata. Annotations are key value pairs.
annotations:
  some/annotation1: value1
  some/annotation2: value2
service.autoScaler.enabled Whether the Horizontal Pod Autoscaler is enabled. If true, the speech-to-text-autoscaler will be deployed in the Kubernetes cluster. true
service.podDisruption.enabled Whether the Pod Disruption Budget is enabled. If true, the speech-to-text-poddisruptionbudget will be deployed in the Kubernetes cluster. true

Sentiment analysis (sub-chart: charts/speechToText)

Starting with v2.2.0 of the speech to text container and v0.2.0 of the Helm chart, the following parameters are used for sentiment analysis using the Language service API.

Parameter Description Values Default
textanalytics.enabled Whether the text-analytics service is enabled true/false false
textanalytics.image.registry The text-analytics docker image registry valid docker image registry
textanalytics.image.repository The text-analytics docker image repository valid docker image repository
textanalytics.image.tag The text-analytics docker image tag valid docker image tag
textanalytics.image.pullSecrets The image secrets for pulling text-analytics docker image valid secrets name
textanalytics.image.pullByHash Specifies if you are pulling docker image by hash. If yes, image.hash is required to have as well. If no, set it as 'false'. Default is false. true/false false
textanalytics.image.hash The text-analytics docker image hash. Only use it with image.pullByHash:true. valid docker image hash
textanalytics.image.args.eula One of the required arguments by text-analytics container, which indicates you've accepted the license. The value of this option must be: accept. accept, if you want to use the container
textanalytics.image.args.billing One of the required arguments by text-analytics container, which specifies the billing endpoint URI. The billing endpoint URI value is available on the Azure portal's Speech Overview page. valid billing endpoint URI
textanalytics.image.args.apikey One of the required arguments by text-analytics container, which is used to track billing information. valid apikey
textanalytics.cpuRequest The requested CPU for text-analytics container int 3000m
textanalytics.cpuLimit The limited CPU for text-analytics container 8000m
textanalytics.memoryRequest The requested memory for text-analytics container 3Gi
textanalytics.memoryLimit The limited memory for text-analytics container 8Gi
textanalytics.service.sentimentURISuffix The sentiment analysis URI suffix, the whole URI is in format "http://<service>:<port>/<sentimentURISuffix>". text/analytics/v3.0-preview/sentiment
textanalytics.service.type The type of text-analytics service in Kubernetes. See Kubernetes service types valid Kubernetes service type LoadBalancer
textanalytics.service.port The port of the text-analytics service int 50085
textanalytics.service.annotations The annotations users can add to text-analytics service metadata. For instance:
annotations:
some/annotation1: value1
some/annotation2: value2
annotations, one per each line
textanalytics.serivce.autoScaler.enabled Whether Horizontal Pod Autoscaler is enabled. If enabled, text-analytics-autoscaler will be deployed in the Kubernetes cluster true/false true
textanalytics.service.podDisruption.enabled Whether Pod Disruption Budget is enabled. If enabled, text-analytics-poddisruptionbudget will be deployed in the Kubernetes cluster true/false true

Text to speech (sub-chart: charts/textToSpeech)

To override the "umbrella" chart, add the prefix textToSpeech. on any parameter to make it more specific. For example, it will override the corresponding parameter for example, textToSpeech.numberOfConcurrentRequest overrides numberOfConcurrentRequest.

Parameter Description Default
enabled Whether the text to speech service is enabled. false
numberOfConcurrentRequest The number of concurrent requests for the text to speech service. This chart automatically calculates CPU and memory resources, based on this value. 2
optimizeForTurboMode Whether the service needs to optimize for text input via text files. If true, this chart will allocate more CPU resource to service. false
image.registry The text to speech docker image registry. containerpreview.azurecr.io
image.repository The text to speech docker image repository. microsoft/cognitive-services-text-to-speech
image.tag The text to speech docker image tag. latest
image.pullSecrets The image secrets for pulling the text to speech docker image.
image.pullByHash Whether the docker image is pulled by hash. If true, image.hash is required. false
image.hash The text to speech docker image hash. Only used when image.pullByHash: true.
image.args.eula (required) Indicates you've accepted the license. The only valid value is accept
image.args.billing (required) The billing endpoint URI value is available on the Azure portal's Speech Overview page.
image.args.apikey (required) Used to track billing information.
service.type The Kubernetes service type of the text to speech service. See the Kubernetes service types instructions for more details and verify cloud provider support. LoadBalancer
service.port The port of the text to speech service. 80
service.annotations The text to speech annotations for the service metadata. Annotations are key value pairs.
annotations:
  some/annotation1: value1
  some/annotation2: value2
service.autoScaler.enabled Whether the Horizontal Pod Autoscaler is enabled. If true, the text-to-speech-autoscaler will be deployed in the Kubernetes cluster. true
service.podDisruption.enabled Whether the Pod Disruption Budget is enabled. If true, the text-to-speech-poddisruptionbudget will be deployed in the Kubernetes cluster. true

Next steps

For more details on installing applications with Helm in Azure Kubernetes Service (AKS), visit here.