Lots of bugs in PDF translation
Microsoft has a unique opportunity to dominate the machine translation market because Google made stupid mistakes.
I'm happy to commit my time to help Microsoft achieve this goal. However, Microsoft must improve the product quickly. I've found many bugs in Microsoft's document translation API, especially in the PDF translation. They include:
- Translate bullet points (The big black dots) into "%a"
- Forget to translate some text and concatenate words together. For example, in a title with text "U N P A R A L L E L E D A C C E S S", Microsoft translators simply removes the white space and translates it into "U N P A R A L L E L E D A C C E S S". Google translate doesn't have this problem. This seems to be a simple PDF parsing bug.
- The generated PDF file is huge. The English version PDF is 1.6M. Google translate produced PDF is 8M but Microsoft translate produced PDF is 24M.
I don't think the above issues are translator problems. But obviously your PDF reading and writing codes are junk and require immediate fix.
I'm happy to provide a test document so you can see the problems. I can provide you the Google translated result too so you can compare.
Happy to provide the original document, the Azure translated document and the Google translated document. Because of the large size of the files and the sensitivity of the files, please provide me the details how to send the files to you. Thanks!
Sign in to comment