Back up and recover speech customer resources

Article
01/22/2024

The Speech service is available in various regions. Speech resource keys are tied to a single region. When you acquire a key, you select a specific region, where your data, model and deployments reside.

Datasets for customer-created data assets, such as customized speech models, custom voice fonts and speaker recognition voice profiles, are also available only within the service-deployed region. Such assets are:

Custom speech

Training audio/text data
Test audio/text data
Customized speech models
Log data

Custom voice

Training audio/text data
Test audio/text data
Custom voice fonts

Speaker Recognition

Speaker enrollment audio
Speaker voice signature

While some customers use our default endpoints to transcribe audio or standard voices for speech synthesis, other customers create assets for customization.

These assets are backed up regularly and automatically by the repositories themselves, so no data loss will occur if a region becomes unavailable. However, you must take steps to ensure service continuity if there's a region outage.

How to monitor service availability

If you use the default endpoints, you should configure your client code to monitor for errors. If errors persist, be prepared to redirect to another region where you have a Speech resource.

Follow these steps to configure your client to monitor for errors:

Find the list of regionally available endpoints in our documentation.
Select a primary and one or more secondary/backup regions from the list.
From Azure portal, create Speech service resources for each region.
- If you have set a specific quota, you might also consider setting the same quota in the backup regions. See details in Speech service Quotas and Limits.
Each region has its own STS token service. For the primary region and any backup regions your client configuration file needs to know the:
- Regional Speech service endpoints
- Regional key and the region code
Configure your code to monitor for connectivity errors (typically connection timeouts and service unavailability errors). Here's sample code in C#: GitHub: Adding Sample for showing a possible candidate for switching regions.
- Since networks experience transient errors, for single connectivity issue occurrences, the suggestion is to retry.
- For persistence, redirect traffic to the new STS token service and Speech service endpoint. For text to speech, reference sample code: GitHub: TTS public voice switching region.

The recovery from regional failures for this usage type can be instantaneous and at a low cost. All that is required is the development of this functionality on the client side. The data loss that incurs assuming no backup of the audio stream will be minimal.

Custom endpoint recovery

Data assets, models or deployments in one region can't be made visible or accessible in any other region.

You should create Speech service resources in both a main and a secondary region by following the same steps as used for default endpoints.

Custom speech

Custom speech service doesn't support automatic failover. We suggest the following steps to prepare for manual or automatic failover implemented in your client code. In these steps, you replicate custom models in a secondary region. With this preparation, your client code can switch to a secondary region when the primary region fails.

Create your custom model in one main region (Primary).
Run the Models_CopyTo operation to replicate the custom model to all prepared regions (Secondary).
Go to Speech Studio to load the copied model and create a new endpoint in the secondary region. See how to deploy a new model in Deploy a custom speech model.
- If you have set a specific quota, also consider setting the same quota in the backup regions. See details in Speech service Quotas and Limits.
Configure your client to fail over on persistent errors as with the default endpoints usage.

Your client code can monitor availability of your deployed models in your primary region, and redirect their audio traffic to the secondary region when the primary fails. If you don't require real-time failover, you can still follow these steps to prepare for a manual failover.

Offline failover

If you don't require real-time failover you can decide to import your data, create and deploy your models in the secondary region at a later time with the understanding that these tasks take time to complete.

Failover time requirements

This section provides general guidance about timing. The times were recorded to estimate offline failover using a representative test data set.

Data upload to new region: 15mins
Acoustic/language model creation: 6 hours (depending on the data volume)
Model evaluation: 30 mins
Endpoint deployment: 10 mins
Model copy API call: 10 mins
Client code reconfiguration and deployment: Depending on the client system

It's nonetheless advisable to create keys for a primary and secondary region for production models with real-time requirements.

Custom voice

Custom voice doesn't support automatic failover. Handle real-time synthesis failures with these two options.

Option 1: Fail over to public voice in the same region.

When custom voice real-time synthesis fails, fail over to a public voice (client sample code: GitHub: custom voice failover to public voice).

Check the public voices available. You can also change the sample code above if you would like to fail over to a different voice or in a different region.

Option 2: Fail over to custom voice on another region.

Create and deploy your custom voice in one main region (primary).
Copy your custom voice model to another region (the secondary region) in Speech Studio.
Go to Speech Studio and switch to the Speech resource in the secondary region. Load the copied model and create a new endpoint.
- Voice model deployment usually finishes in 3 minutes.
- Each endpoint is subject to extra charges. Check the pricing for model hosting here.
Configure your client to fail over to the secondary region. See sample code in C#: GitHub: custom voice failover to secondary region.

Speaker recognition

Speaker recognition uses Azure paired regions to automatically fail over operations. Speaker enrollments and voice signatures are backed up regularly to prevent data loss and to be used if there's an outage.

During an outage, the speaker recognition service automatically fails over to a paired region and use the backed-up data to continue processing requests until the main region is back online.