Custom translator

Manuel 41 Reputation points
2023-03-30T20:28:18.3733333+00:00

Good feature mentioned in the video from Microsoft https://azure.microsoft.com/en-us/products/cognitive-services/translator#content-card-list-oc803c It says we can use dictionary to make a custom translator with low code. Can you provide how to steps? What kind of format can be used for the dictionary? How to upload it? I need to scan the dic in real?

Thanks for any help!

Azure AI Translator
Azure AI Translator
An Azure service to easily conduct machine translation with a simple REST API call.
485 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,971 Reputation points Moderator
    2023-03-31T01:37:37.9533333+00:00

    Hello @Manuel

    Thanks for reaching out to us. To train a model with dictionary, you need to upload your dictionary document to your language studio as below steps -

    1.Go to Language Studio https://language.cognitive.azure.com/ and select "Translate Text" - "Customize Translation"

    User's image

    Follow the guidance to train a customer model, you need to create a new project as this - https://learn.microsoft.com/en-us/azure/cognitive-services/Translator/custom-translator/how-to/train-custom-model#when-to-select-dictionary-only-training

    Then you can add your dictionary document as "Dictionary set" as below

    User's image

    For what is a dictionary set, please refer to the document - https://learn.microsoft.com/en-us/azure/cognitive-services/Translator/custom-translator/concepts/dictionaries

    Recommendations

    • Dictionaries aren't a substitute for training a model using training data. For better results, we recommended letting the system learn from your training data. However, when sentences or compound nouns must be translated verbatim, use a dictionary.
    • The phrase dictionary should be used sparingly. When a phrase within a sentence is replaced, the context of that sentence is lost or limited for translating the rest of the sentence. The result is that, while the phrase or word within the sentence will translate according to the provided dictionary, the overall translation quality of the sentence often suffers.
    • The phrase dictionary works well for compound nouns like product names ("Microsoft SQL Server"), proper names ("City of Hamburg"), or product features ("pivot table"). It doesn't work as well for verbs or adjectives because those words are typically highly contextual within the source or target language. The best practice is to avoid phrase dictionary entries for anything but compound nouns.
    • If you're using a phrase dictionary, capitalization and punctuation are important. Dictionary entries are case- and punctuation-sensitive. Custom Translator will only match words and phrases in the input sentence that use exactly the same capitalization and punctuation marks as specified in the source dictionary file. Also, translations will reflect the capitalization and punctuation provided in the target dictionary file. Example
      • If you're training an English-to-Spanish system that uses a phrase dictionary and you specify "SQL server" in the source file and "Microsoft SQL Server" in the target file. When you request the translation of a sentence that contains the phrase "SQL server", Custom Translator will match the dictionary entry and the translation will contain "Microsoft SQL Server."
      • When you request translation of a sentence that includes the same phrase but doesn't match what is in your source file, such as "sql server", "sql Server" or "SQL Server", it won't return a match from your dictionary.
      • The translation follows the rules of the target language as specified in your phrase dictionary.
    • If you're using a sentence dictionary, end-of-sentence punctuation is ignored. Example
      • If your source dictionary contains "This sentence ends with punctuation!", then any translation requests containing "This sentence ends with punctuation" will match.
    • Your dictionary should contain unique source lines. If a source line (a word, phrase, or sentence) appears more than once in a dictionary file, the system will always use the last entry provided and return the target when a match is found.
    • Avoid adding phrases that consist of only numbers or are two- or three-letter words, such as acronyms, in the source dictionary file.

    I hope this helps! Please let me know if you need further help for any of above.

    Regards

    Yutong

    -Please kindly accept the answer and vote 'Yes' if you feel helpful to support the community, thanks a lot.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.