Quickstart: Build, publish, and translate with custom models
Translator is a cloud-based neural machine translation service that is part of the Azure Cognitive Services family of REST API that can be used with any operating system. Translator powers many Microsoft products and services used by thousands of businesses worldwide to perform language translation and other language-related operations. In this quickstart, you'll learn to build custom solutions for your applications across all supported languages.
To use the Custom Translator portal, you'll need the following resources:
Azure subscription - Create one for free
Once you have an Azure subscription, create a Translator resource in the Azure portal to get your key and endpoint. After it deploys, select Go to resource.
You'll need the key and endpoint from the resource to connect your application to the Translator service. You'll paste your key and endpoint into the code below later in the quickstart. You can find these values on the Azure portal Keys and Endpoint page:
For more information, see how to create a Translator resource.
Custom Translator portal
Custom Translator does not support creating workspace for a Translator Text API resource created inside an Enabled VNet.
Once you have the above prerequisites, sign in to the Custom Translator portal to create workspaces, build projects, upload files, train models, and publish your custom solution.
You can read an overview of translation and custom translation, learn some tips, and watch a getting started video in the Azure AI technical blog.
Create a workspace. A workspace is a work area for composing and building your custom translation system. A workspace can contain multiple projects, models, and documents. All the work you do in Custom Translator is done inside a specific workspace.
Create a project. A project is a wrapper for models, documents, and tests. Each project includes all documents that are uploaded into that workspace with the correct language pair. For example, if you have both an English-to-Spanish project and a Spanish-to-English project, the same documents will be included in both projects.
Upload parallel documents. Parallel documents are pairs of documents where one (target) is the translation of the other (source). One document in the pair contains sentences in the source language and the other document contains sentences translated into the target language. It doesn't matter which language is marked as "source" and which language is marked as "target"—a parallel document can be used to train a translation system in either direction.
Train your model. A model is the system that provides translation for a specific language pair. The outcome of a successful training is a model. When you train a model, three mutually exclusive document types are required: training, tuning, and testing. If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing data. It will use a random subset of sentences from your training documents, and exclude these sentences from the training data itself. A 10,000 parallel sentence is the minimum requirement to train a model.
Publish (deploy) your trained model. Your custom model is made available for runtime translation requests.
Create a workspace
After your sign-in to Custom Translator, you'll be asked for permission to read your profile from the Microsoft identity platform to request your user access token and refresh token. Both tokens are needed for authentication and to ensure that you aren't signed out during your live session or while training your models. Select Yes.
Select My workspaces.
Select Create a new workspace.
Type Contoso MT models for Workspace name and select Next.
Select "Global" for Select resource region from the dropdown list.
Copy/paste your Translator Services key.
Region must match the region that was selected during the resource creation. You can use KEY 1 or KEY 2.
Create a project
Once the workspace is created successfully, you'll be taken to the Projects page.
You'll create English-to-German project to train a custom model with only a training document type.
Select Create project.
Type English-to-German for Project name.
Select English (en) as Source language from the dropdown list.
Select German (de) as Target language from the dropdown list.
Select General for Domain from the dropdown list.
Select Create project.
In this quickstart, you'll upload training documents for customization.
You can use our sample training, phrase and sentence dictionaries dataset, Customer sample English-to-German datasets, for this quickstart. However, for production, it's better to upload your own training dataset.
Select English-to-German project name.
Select Manage documents from the left navigation menu.
Select Add document set.
Check the Training set box and select Next.
Keep Parallel documents checked and type sample-English-German.
Under the Source (English - EN) file, select Browse files and select sample-English-German-Training-en.txt.
Under Target (German - EN) file, select Browse files and select sample-English-German-Training-de.txt.
You can upload the sample phrase and sentence dictionaries dataset. This step is left for you to complete.
Train your model
Now you're ready to train your English-to-German model.
Select Train model from the left navigation menu.
Type en-de with sample data for Model name.
Keep Full training checked.
Under Select documents, check sample-English-German and review the training cost associated with the selected number of sentences.
Select Train now.
Select Train to confirm.
Notifications displays model training in progress, e.g., Submitting data state. Training model takes few hours, subject to the number of selected sentences.
After successful model training, select Model details from the left navigation menu.
Select the model name en-de with sample data. Review training date/time, total training time, number of sentences used for training, tuning, testing, and dictionary. Check whether the system generated the test and tuning sets. You'll use the
Category IDto make translation requests.
Evaluate the model BLEU score. The test set BLEU score is the custom model score and Baseline BLEU is the pre-trained baseline model used for customization. A higher BLEU score means higher translation quality using the custom model.
If you train with our shared customer sample datasets, BLEU score will be different than the image.
Test your model
Once your training has completed successfully, inspect the test set translated sentences.
- Select Test model from the left navigation menu.
- Select "en-de with sample data"
- Human evaluate translation from New model (custom model), and Baseline model (our pre-trained baseline used for customization) against Reference (target translation from the test set)
Publish your model
Publishing your model makes it available for use with the Translator API. A project might have one or many successfully trained models. You can only publish one model per project; however, you can publish a model to one or multiple regions depending on your needs. For more information, see Translator pricing.
Select Publish model from the left navigation menu.
Select en-de with sample data and select Publish.
Check the desired region(s).
Select Publish. The status should transition from Deploying to Deployed.
Business users may want to download and install our free DocumentTranslator app for Windows.