What is a dictionary?
A dictionary is an aligned pair of documents that specifies a list of phrases or sentences and their corresponding translations. Use a dictionary in your training, when you want Translator to translate any instances of the source phrase or sentence, using the translation you've provided in the dictionary. Dictionaries are sometimes called glossaries or term bases. You can think of the dictionary as a brute force "copy and replace" for all the terms you list. Furthermore, Microsoft Custom Translator service builds and makes use of its own general purpose dictionaries to improve the quality of its translation. However, a customer provided dictionary takes precedent and will be searched first to look up words or sentences.
Dictionaries only work for projects in language pairs that have a fully supported Microsoft general neural network model behind them. View the complete list of languages.
A phrase dictionary is case-sensitive. It's an exact find-and-replace operation. When you include a phrase dictionary in training your model, any word or phrase listed is translated in the way specified. The rest of the sentence is translated as usual. You can use a phrase dictionary to specify phrases that shouldn't be translated by providing the same untranslated phrase in the source and target files.
A sentence dictionary is case-insensitive. The sentence dictionary allows you to specify an exact target translation for a source sentence. For a sentence dictionary match to occur, the entire submitted sentence must match the source dictionary entry. If the source dictionary entry ends with punctuation, it's ignored during the match. If only a portion of the sentence matches, the entry won't match. When a match is detected, the target entry of the sentence dictionary will be returned.
You can train a model using only dictionary data. To do so, select only the dictionary document (or multiple dictionary documents) that you wish to include and select Create model. Since this training is dictionary-only, there's no minimum number of training sentences required. Your model will typically complete training much faster than a standard training. The resulting models will use the Microsoft baseline models for translation with the addition of the dictionaries you've added. You won't get a test report.
Custom Translator doesn't sentence align dictionary files, so it is important that there are an equal number of source and target phrases/sentences in your dictionary documents and that they are precisely aligned.
Dictionaries aren't a substitute for training a model using training data. For better results, we recommended letting the system learn from your training data. However, when sentences or compound nouns must be translated verbatim, use a dictionary.
The phrase dictionary should be used sparingly. When a phrase within a sentence is replaced, the context of that sentence is lost or limited for translating the rest of the sentence. The result is that, while the phrase or word within the sentence will translate according to the provided dictionary, the overall translation quality of the sentence often suffers.
The phrase dictionary works well for compound nouns like product names ("Microsoft SQL Server"), proper names ("City of Hamburg"), or product features ("pivot table"). It doesn't work as well for verbs or adjectives because those words are typically highly contextual within the source or target language. The best practice is to avoid phrase dictionary entries for anything but compound nouns.
If you're using a phrase dictionary, capitalization and punctuation are important. Dictionary entries are case- and punctuation-sensitive. Custom Translator will only match words and phrases in the input sentence that use exactly the same capitalization and punctuation marks as specified in the source dictionary file. Also, translations will reflect the capitalization and punctuation provided in the target dictionary file.
- If you're training an English-to-Spanish system that uses a phrase dictionary and you specify "SQL server" in the source file and "Microsoft SQL Server" in the target file. When you request the translation of a sentence that contains the phrase "SQL server", Custom Translator will match the dictionary entry and the translation will contain "Microsoft SQL Server."
- When you request translation of a sentence that includes the same phrase but doesn't match what is in your source file, such as "sql server", "sql Server" or "SQL Server", it won't return a match from your dictionary.
- The translation follows the rules of the target language as specified in your phrase dictionary.
If you're using a sentence dictionary, end-of-sentence punctuation is ignored.
- If your source dictionary contains "This sentence ends with punctuation!", then any translation requests containing "This sentence ends with punctuation" will match.
Your dictionary should contain unique source lines. If a source line (a word, phrase, or sentence) appears more than once in a dictionary file, the system will always use the last entry provided and return the target when a match is found.
Avoid adding phrases that consist of only numbers or are two- or three-letter words, such as acronyms, in the source dictionary file.