An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
Hello YM Set Sophy,
Thank you for reaching out to Microsoft Q&A,
You’re not missing anything here, what you’re seeing is exactly how the service behaves today, especially with documents like garment tech packs that mix text, layouts, logos, and drawings.
Let me walk through this in a straightforward way.
Why all of this is happening
Document Translation isn’t editing your PDF in place. It’s more like:
- extract text (either from PDF text layer or OCR)
- translate it
- then redraw the document
That last step re-rendering is where most of these issues come from. The service tries to recreate the layout, but it doesn’t preserve everything exactly the way it was originally designed.
That’s why layout-heavy documents (like tech packs) tend to show more problems compared to plain text PDFs.
On your specific points
You’re correct on all three, and here’s how they play out in practice.
Mask padding / bounding boxes (the “black dots” issue) What you’re seeing in scanned PDFs is a side effect of OCR. The service detects text regions and places a white mask over them before writing translated text. Those masks are very tightly fitted to the detected text.
There’s currently no way to tell it:
- “make the mask slightly bigger”
- or “clean the background more aggressively”
So any tiny part of the original glyph that falls outside that box shows up as those residual dots.
No way to exclude logos or drawings Right now, everything that looks like text is treated the same way.
There’s no option to say:
- “ignore this area”
- “don’t translate this logo”
- “skip this diagram”
So if your logo or branding contains text, the service will:
- detect it
- translate it
- redraw it
Which is why you lose the original styling.
Glossary does not preserve fonts or styling Glossary only controls what text becomes, not how it looks.
Even if you map something like brand name - same brand name
the service still removes the original text and redraws it using available fonts. It doesn’t reuse the embedded font from the PDF.
So typography loss is expected.
The other things you mentioned
Khmer font inconsistency This is usually font substitution during rendering. If the exact font isn’t available or supported in the output pipeline, it falls back to something close which can change size and spacing.
Short text being detected as English Also expected. Very short strings don’t give the language detector enough context. Adding punctuation or surrounding context helps, which is why your workaround improves it.
So what does this mean in reality?
There are a few hard limitations today:
- You can’t control OCR masking behavior
- You can’t exclude specific regions (like logos or sketches)
- You can’t preserve original fonts or styling through glossary
So this isn’t something you can fix by tweaking settings it’s just how the service is designed right now.
What people usually do for documents like yours
For garment tech packs, most teams don’t rely on a single “translate the whole PDF” step. They break it into a small pipeline.
Before translation
- Flatten or outline logos (so they become shapes, not text)
- Remove or isolate drawings/branding if possible
During translation
- Use Document Translation for the actual content (descriptions, specs, tables)
After translation
- Put logos back
- Reapply fonts if needed
- Or overlay translated text onto a clean version of the original
For scanned documents specifically, some teams also run their own OCR first (with better control over masking), clean the image, and then pass it to translation.Please refer this
Azure Translator document translation overview https://learn.microsoft.com/azure/ai-services/translator/document-translation/overview
Azure Translator in Foundry Tools Transparency Note (limitations section) https://learn.microsoft.com/azure/foundry/responsible-ai/translator/transparency-note
Known issues for Azure AI Translator https://learn.microsoft.com/en-us/azure/ai-services/translator/reference/known-issues
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thank you!