Customize a speech model

Important

The deadline for migrating Azure Video Indexer content due to the Azure Media Services retirement has passed. See the retirement guide for more information.

Note

Speech model customization, including pronunciation training, is only supported in Video Indexer Azure trial accounts and Resource Manager accounts. It is not supported in classic accounts. For guidance on how to update your account type at no cost, see the Update your Azure AI Video Indexer account. For guidance on using the custom language experience, see Customize a Language model.

Azure AI Video Indexer lets you create custom speech models to customize speech recognition by uploading datasets that are used to create a speech model. This article goes through the steps to do so through the Video Indexer website. You can also use the API, as described in Customize speech model using API.

For a detailed overview and best practices for custom speech models, see Customize a speech model with Azure AI Video Indexer.

Prerequisites

Create a dataset

As all custom models must contain a dataset, we'll start with the process of how to create and manage datasets.

  1. Select the Model customization button.
  2. Select the Speech (new) tab.
  3. Select Upload dataset.
  4. Select either Plain text or Pronunciation from the Dataset type dropdown menu. Every speech model must have a plain text dataset and can optionally have a pronunciation dataset.
  5. Select Browse and select the dataset file. You can choose only one.
  6. Select a Language for the model. Choose the language that is spoken in the media files you plan on indexing with this model. The Dataset name is prepopulated with the name of the file but you can modify the name.
  7. You can optionally add a description of the dataset. This could be helpful to distinguish each dataset if you expect to have multiple datasets.
  8. Select Upload. When the dataset creation is complete, you can use it for training and creation of new models.

Review and update a dataset

You can view a dataset and its properties by:

  • Clicking on the dataset name
  • Hovering over the dataset
  • Selecting the ellipsis

Then, select View Dataset.

You can then view the name, description, language, and status of the dataset plus the following properties:

Number of lines: indicates the number of lines successfully loaded out of the total number of lines in the file. If the entire file is loaded successfully the numbers will match (for example, 10 of 10 normalized). If the numbers don't match (for example, 7 of 10 normalized), this means that only some of the lines successfully loaded and the rest had errors. Common causes of errors are formatting issues with a line, such as not spacing a tab between each word in a pronunciation file. Reviewing the plain text and pronunciation data for training articles should be helpful in finding the issue. To troubleshoot the cause, review the error details, which are contained in the report. Select View report to view the error details regarding the lines that didn't load successfully (errorKind). This can also be viewed by selecting the Report tab.

Dataset ID: Each dataset has a unique GUID, which is needed when using the API for operations that reference the dataset.

Plain text (normalized): This contains the normalized text of the loaded dataset file. Normalized text is the recognized text in plain form without formatting.

Edit Details: To edit a dataset's name or description, when hovering over the dataset, select on the ellipsis and then select Edit details. You're then able to edit the dataset name and description.

Note

The data in a dataset can't be edited or updated once the dataset has been uploaded. If you need to edit or update the data in a dataset, download the dataset, perform the edits, save the file, and upload the new dataset file.

Download: To download a dataset file, when hovering over the dataset, select on the ellipsis and then select Download. Alternatively, when viewing the dataset, you can select Download and then have the option of downloading the dataset file or the upload report in JSON form.

Delete: To delete a dataset, when hovering over the dataset, select on the ellipsis and then select Delete.

Create a custom speech model

Datasets are used in the creation and training of models. Once you have created a plain text dataset, you can create and start using a custom speech model.

Keep in mind the following when creating and using custom speech models:

  • A new model must include at least one plain text dataset and can have multiple plain text datasets.
  • It's optional to include a pronunciation dataset and no more than one can be included.
  • Once a model is created, you can't add additional datasets to it or perform any modifications to its datasets. If you need to add or modify datasets, create a new model.
  • If you have indexed a video using a custom speech model and then delete the model, the transcript isn't impacted unless you perform a reindex.
  • If you deleted a dataset that was used to train a custom model, as the speech model was already trained by the dataset, it continues to use it until the speech model is deleted.
  • If you delete a custom model, it has no impact of the transcription of videos that were already indexed using the model.

Train a model

Note

Once a model is created, datasets can't be added. A model can only contain datasets of the same language.

There are two ways to train a model – through the dataset tab and through the model tab.

Train a model through the Datasets tab

  1. View the list of datasets.
  2. Select a plain text dataset. The Train new model icon above can then be selected.
  3. Select Train new model.
  4. Enter a name for the model, a language, and optionally add a description.
  5. Select the Datasets tab
  6. Select the datasets you want to be included in the model.
  7. Select Create and train.

Train a model through the Models tab

  1. Select the Models tab.
  2. Select Train new model icon.
  3. Select the datasets that you want to be part of the model.
  4. Enter a name for the model, a language, and optionally add a description.
  5. Select the Datasets tab.
  6. Select the datasets you want to be included in the model.
  7. Select Create and train.

Review and update a model

View Model: You can view a model and its properties by either clicking on the model’s name or when hovering over the model, clicking on the ellipsis and then selecting View Model.

You'll then see in the Details tab the name, description, language, and status of the model plus the following properties:

Model ID: Each model has a unique GUID, which is needed when using the API for operations that reference the model.

Created on: The date the model was created.

Edit Details: To edit a model’s name or description, when hovering over the model, select on the ellipsis and then select Edit details. You're then able to edit the model’s name and description.

Note

Only the model’s name and description can be edited. If you want to make any changes to its datasets or add datasets, a new model must be created.

Delete: To delete a model, when hovering over the dataset, select on the ellipsis and then select Delete.

Included datasets: Select on the Included datasets tab to view the model’s datasets.

Use a custom language model when indexing a video

A custom language model isn't used by default for indexing jobs, so must be selected during the index upload process.

  1. During the upload process, select your custom language model source from the language drop-down menu.
  2. Select Upload.

The same steps apply when you want to reindex a video with a custom model.