Quickstart: Build, publish, and translate with custom models

Important

Custom Translator v2.0 is currently in public preview. Some features may not be supported or have constrained capabilities.

Translator is a cloud-based neural machine translation service that is part of the Azure Cognitive Services family of REST APIs. Translator can be used with any operating system and powers many Microsoft products and services used by thousands of businesses worldwide to perform language translation and other language-related operations. In this quickstart, you'll learn to build custom solutions for your applications across all supported languages.

Prerequisites

To use the Custom Translator preview portal, you'll need the resources:

  • A Microsoft account.

  • Azure subscription - Create one for free

  • Once you have an Azure subscription, create a Translator resource in the Azure portal to get your key and endpoint. After it deploys, select Go to resource.

    • You'll need the key and endpoint from the resource to connect your application to the Translator service. You'll paste your key and endpoint into the code below later in the quickstart. You can find these values on the Azure portal Keys and Endpoint page:

      Screenshot: Azure portal keys and endpoint page.

    See how to create a Translator resource.

Once you have the above prerequisites, sign in to the Custom Translator preview portal to create workspaces, build projects, upload files, train models, and publish your custom solution.

You can read an overview of translation and custom translation, learn some tips, and watch a getting started video in the Azure AI technical blog.

Note

Custom Translator does not support creating workspace for a Translator Text API resource created inside an Enabled VNet.

Process summary

  1. Create a workspace. A workspace is a work area for composing and building your custom translation system. A workspace can contain multiple projects, models, and documents. All the work you do in Custom Translator is done inside a specific workspace.

  2. Create a project. A project is a wrapper for models, documents, and tests. Each project includes all documents that are uploaded into that workspace with the correct language pair. For example, if you have both an English-to-Spanish project and a Spanish-to-English project, the same documents will be included in both projects.

  3. Upload parallel documents. Parallel documents are pairs of documents where one (target) is the translation of the other (source). One document in the pair contains sentences in the source language and the other document contains sentences translated into the target language. It doesn't matter which language is marked as "source" and which language is marked as "target"—a parallel document can be used to train a translation system in either direction.

  4. Train your model. A model is the system that provides translation for a specific language pair. The outcome of a successful training is a model. When you train a model, three mutually exclusive document types are required: training, tuning, and testing. If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing data. It will use a random subset of sentences from your training documents, and exclude these sentences from the training data itself. A 10,000 parallel sentence is the minimum requirement to train a model.

  5. Test (human evaluate) your model. The testing set is used to compute the BLEU score. This score indicates the quality of your translation system.

  6. Publish (deploy) your trained model. Your custom model is made available for runtime translation requests.

  7. Translate text. Use the cloud-based, secure, high performance, highly scalable Microsoft Translator Text API V3 to make translation requests.

Create a workspace

  1. After your sign-in to Custom Translator, you'll be asked for permission to read your profile from the Microsoft identity platform to request your user access token and refresh token. Both tokens are needed for authentication and to ensure that you aren't signed out during your live session or while training your models.
    Select Yes.

    Screenshot illustrating how to create a workspace.

  2. Select My workspaces

  3. Select Create a new workspace

  4. Type Contoso MT models for Workspace name and select Next

  5. Select "Global" for Select resource region from the dropdown list.

  6. Copy/paste your Translator Services key.

  7. Select Next.

  8. Select Done

    Note

    Region must match the region that was selected during the resource creation. You can use KEY 1 or KEY 2.

    Screenshot illustrating the resource key.

    Screenshot illustrating workspace creation.

Create a project

Once the workspace is created successfully, you'll be taken to the Projects page.

You'll create English-to-German project to train a custom model with only a training document type.

  1. Select Create project.

  2. Type English-to-German for Project name.

  3. Select English (en) as Source language from the dropdown list.

  4. Select German (de) as Target language from the dropdown list.

  5. Select General for Domain from the dropdown list.

  6. Select Create project

    Screenshot illustrating how to create a project.

Upload documents

In order to create a custom model, you need to upload all or a combination of training, tuning, testing, and dictionary document types.

In this quickstart, you'll upload training documents for customization.

Note

You can use our sample training, phrase and sentence dictionaries dataset, Customer sample English-to-German datasets, for this quickstart. However, for production, it's better to upload your own training dataset.

  1. Select English-to-German project name.

  2. Select Manage documents from the left navigation menu.

  3. Select Add document set.

  4. Check the Training set box and select Next.

  5. Keep Parallel documents checked and type sample-English-German.

  6. Under the Source (English - EN) file, select Browse files and select sample-English-German-Training-en.txt.

  7. Under Target (German - EN) file, select Browse files and select sample-English-German-Training-de.txt.

  8. Select Upload

    Note

    You can upload the sample phrase and sentence dictionaries dataset. This step is left for you to complete.

    Screenshot illustrating how to upload documents.

Train your model

Now you're ready to train your English-to-German model.

  1. Select Train model from the left navigation menu.

  2. Type en-de with sample data for Model name.

  3. Keep Full training checked.

  4. Under Select documents, check sample-English-German and review the training cost associated with the selected number of sentences.

  5. Select Train now.

  6. Select Train to confirm.

    Note

    Notifications displays model training in progress, e.g., Submitting data state. Training model takes few hours, subject to the number of selected sentences.

    Screenshot illustrating how to create a model.

  7. After successful model training, select Model details from the left navigation menu.

  8. Select the model name en-de with sample data to review training date/time, total training time, number of sentences used for training, tuning, testing, dictionary, and whether the system generated the test and tuning sets. You'll use the Category ID to make translation requests.

  9. Evaluate the model BLEU score. The test set BLEU score is the custom model score and Baseline BLEU is the pre-trained baseline model used for customization. A higher BLEU score means higher translation quality using the custom model.

    Note

    If you train with our shared customer sample datasets, BLEU score will be different than the image.

    Screenshot illustrating model details.

Test your model

Once your training has completed successfully, inspect the test set translated sentences.

  1. Select Test model from the left navigation menu.
  2. Select "en-de with sample data"
  3. Human evaluate translation from New model (custom model), and Baseline model (our pre-trained baseline used for customization) against Reference (target translation from the test set)

Publish your model

Publishing your model makes it available for use with the Translator API. A project might have one or many successfully trained models. You can only publish one model per project; however, you can publish a model to one or multiple regions depending on your needs. For more information, see Translator pricing.

  1. Select Publish model from the left navigation menu.

  2. Select en-de with sample data and select Publish.

  3. Check the desired region(s).

  4. Select Publish. The status should transition from Deploying to Deployed.

    Screenshot illustrating how to deploy a trained model.

Translate text

  1. Developers should use the Category ID when making translation requests with Microsoft Translator Text API V3. More information about the Translator Text API can be found on the API Reference webpage.

  2. Business users may want to download and install our free DocumentTranslator app for Windows.

Next steps