Lots of bugs in PDF translation

Bigfatball 1 Reputation point

Microsoft has a unique opportunity to dominate the machine translation market because Google made stupid mistakes.

I'm happy to commit my time to help Microsoft achieve this goal. However, Microsoft must improve the product quickly. I've found many bugs in Microsoft's document translation API, especially in the PDF translation. They include:

  1. Translate bullet points (The big black dots) into "%a"
  2. Forget to translate some text and concatenate words together. For example, in a title with text "U N P A R A L L E L E D A C C E S S", Microsoft translators simply removes the white space and translates it into "U N P A R A L L E L E D A C C E S S". Google translate doesn't have this problem. This seems to be a simple PDF parsing bug.
  3. The generated PDF file is huge. The English version PDF is 1.6M. Google translate produced PDF is 8M but Microsoft translate produced PDF is 24M.

I don't think the above issues are translator problems. But obviously your PDF reading and writing codes are junk and require immediate fix.

I'm happy to provide a test document so you can see the problems. I can provide you the Google translated result too so you can compare.

Azure Translator
Azure Translator
An Azure service to easily conduct machine translation with a simple REST API call.
312 questions
{count} votes