Deploy a user-managed glossary
Microsoft Translator containers enable you to run several features of the Translator service in your own environment and are great for specific security and data governance requirements.
There may be times when you're running a container with a multi-layered ingestion process when you discover that you need to implement an update to sentence and/or phrase files. Since the standard phrase and sentence files are encrypted and read directly into memory at runtime, you need to implement a quick-fix engineering solution to implement a dynamic update. This update can be implemented using our user-managed glossary feature:
To deploy the phrasefix solution, you need to create a phrasefix glossary file to specify that a listed phrase is translated in a specified way.
To deploy the sentfix solution, you need to create a sentfix glossary file to specify an exact target translation for a source sentence.
The phrasefix and sentfix files are then included with your translation request and read directly into memory at runtime.
Managed glossary workflow
Important
UTF-16 LE is the only accepted file format for the managed-glossary folders. For more information about encoding your files, see Encoding
To get started manually creating the folder structure, you need to create and name your folder. The managed-glossary folder is encoded in UTF-16 LE BOM format and nests phrasefix or sentfix source and target language files. Let's name our folder
customhotfix
. Each folder can have phrasefix and sentfix files. You provide the source (src
) and target (tgt
) language codes with the following naming convention:Glossary file name format Example file name { src
}.{tgt
}.{container-glossary}.{phrasefix}.src.snten.es.container-glossary.phrasefix.src.snt { src
}.{tgt
}.{container-glossary}.{phrasefix}.tgt.snten.es.container-glossary.phrasefix.tgt.snt { src
}.{tgt
}.{container-glossary}.{sentfix}.src.snten.es.container-glossary.sentfix.src.snt { src
}.{tgt
}.{container-glossary}.{sentfix}.tgt.snten.es.container-glossary.sentfix.tgt.snt Note
- The phrasefix solution is an exact find-and-replace operation. Any word or phrase listed is translated in the way specified.
- The sentfix solution is more precise and allows you to specify an exact target translation for a source sentence. For a sentence match to occur, the entire submitted sentence must match the sentfix entry. If only a portion of the sentence matches, the entry won't match.
- If you're hesitant about making sweeping find-and-replace changes, we recommend, at the outset, solely using the sentfix solution.
Next, to dynamically reload glossary entry updates, create a
version.json
file within thecustomhotfix
folder. Theversion.json
file should contain the following parameters: VersionId. An integer value.Sample version.json file
{ "VersionId": 5 }
Tip
Reload can be controlled by setting the following environmental variables when starting the container:
- HotfixReloadInterval=. Default value is 5 minutes.
- HotfixReloadEnabled=. Default value is true.
Use the docker run command
Docker run command required options
docker run --rm -it -p 5000:5000 \ -e eula=accept \ -e billing={ENDPOINT_URI} \ -e apikey={API_KEY} \ -e Languages={LANGUAGES_LIST} \ -e HotfixDataFolder={path to glossary folder} {image}
Example docker run command
docker run -rm -it -p 5000:5000 \ -v /mnt/d/models:/usr/local/models -v /mnt/d /customerhotfix:/usr/local/customhotfix \ -e EULA=accept \ -e billing={ENDPOINT_URI} \ -e apikey={API_Key} \ -e Languages=en,es \ -e HotfixDataFolder=/usr/local/customhotfix\ mcr.microsoft.com/azure-cognitive-services/translator/text-translation:latest
Learn more
Feedback
Submit and view feedback for