Build and manage training documents

Custom Translator enables you to build translation models that reflect your business, industry, and domain-specific terminology and style. Training and deploying a custom model is easy and doesn't require any programming skills. Custom Translator allows you to upload parallel files, translation memory files, or zip files.

Parallel documents are pairs of documents where one (target) is a translation of the other (source). One document in the pair contains sentences in the source language and the other document contains those sentences translated into the target language.

Before uploading your documents, review the document formats and naming convention guidance to make sure Custom Translator supports your file format.

How to create document sets

Finding in-domain quality data is often a challenging task that varies based on user classification. Here are some questions you can ask yourself as you evaluate what data may be available to you:

  • Does your company have previous translation data available that you can use? Enterprises often have a wealth of translation data accumulated over many years of using human translation.

  • Do you have a vast amount of monolingual data? Monolingual data is data in only one language. If so, can you get translations for this data?

  • Can you crawl online portals to collect source sentences and synthesize target sentences?

Training material for each document types

Source What it does Rules to follow
Bilingual training documents Teaches the system your terminology and style. Be liberal. Any in-domain human translation is better than machine translation. Add and remove documents as you go and try to improve the BLEU score.
Tuning documents Trains the Neural Machine Translation parameters. Be strict. Compose them to be optimally representative of what you are going to translation in the future.
Test documents Calculate the BLEU score. Be strict. Compose test documents to be optimally representative of what you plan to translate in the future.
Phrase dictionary Forces the given translation 100% of the time. Be restrictive. A phrase dictionary is case-sensitive and any word or phrase listed is translated in the way you specify. In many cases, it's better to not use a phrase dictionary and let the system learn.
Sentence dictionary Forces the given translation 100% of the time. Be strict. A sentence dictionary is case-insensitive and good for common in domain short sentences. For a sentence dictionary match to occur, the entire submitted sentence must match the source dictionary entry. If only a portion of the sentence matches, the entry doesn't match.

How to upload documents

Document types are associated with the language pair selected when you create a project.

  1. Sign-in to Custom Translator portal. Your default workspace is loaded and a list of previously created projects are displayed.

  2. Select the desired project Name. By default, the Manage documents blade is selected and a list of previously uploaded documents is displayed.

  3. Select Add document set and choose the document type:

    • Training set
    • Testing set
    • Tuning set
    • Dictionary set:
      • Phrase Dictionary
      • Sentence Dictionary
  4. Select Next.

    Screenshot illustrating the document upload link.

    Note

    Choosing Dictionary set launches Choose type of dictionary dialog. Choose one and select Next

  5. Select your documents format from the radio buttons.

    Screenshot illustrating the upload document page.

    • For Parallel documents, fill in the Document set name and select Browse files to select source and target documents.
    • For Translation memory (TM) file or Upload multiple sets with ZIP, select Browse files to select the file
  6. Select Upload.

At this point, Custom Translator is processing your documents and attempting to extract sentences as indicated in the upload notification. Once done processing, you see the upload successful notification.

Screenshot illustrating the upload document processing dialog window.

View upload history

In workspace page you can view history of all document uploads details like document type, language pair, upload status etc.

  1. The upload history tab shows history from the Custom Translator portal workspace page.

    Screenshot showing the upload history tab.

  2. This page shows the status of all of your past uploads. It displays uploads from most recent to least recent. Each upload status shows document name, created by, upload status, upload date, number of files uploaded, type of file uploaded, and language pairs. You can use filter to quickly find documents by name, status, language, and date range.

    Screenshot showing the upload history page.

  3. The upload history details page shows the files uploaded as part of the uploaded status of the file, language of the file, and error message (if there's an error in upload).

Next steps