Form Recognizer: PDFs with own temporary fonts are not recognized correctly

Benedikt Schmaler 6 Reputation points
2021-05-25T08:09:00.02+00:00

As it seems, Form Recognizer does not correctly recognize PDF files created with custom temporary fonts.

For example, I have a file that was created with a custom font. In the PDF file, the text looks like this:
99417-grafik.png

But the detection provides this result:
?hZkd]’ Jej[dj_WbWki]b[Y^ kdZ iedij][ MY^kjpcWydW^c[d

This is also the same result when I copy this text from the PDF file and paste it into a text editor.
As far as I can tell, in this case the recognition does not run over the recognition of the text in the image, but over the plain text contained in the PDF file which, because of the font, is not recognized correctly.

Do you know if there are any plans in future releases to recognize text with unknown fonts if they are included in the PDF file?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,445 questions
{count} vote