@YutongTie-MSFT yes the document is in English.
In my case for this particular 1-page PDF document, all the text is in a wrong encoding. Sending the same page converted to a JPEG gives me the correct text. So it looks to me like there is a problem with the OCR on the document.
PDF text not extracted
Paul Pawletta
21
Reputation points
Hi, I have one PDF document, where my custom neural model returns the text in some weird encoded way. The entity bounding boxes seem correct, just the text content is bad in the JSON response and also visualized in Form Recognizer Studio. Sending the PDF document converted to a JPEG gives me the correct text entities!
Is there any requirement for the PDF document? Unfortunately I can't share the original document here, because it contains customer info.
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
1 answer
Sort by: Most helpful
-
Paul Pawletta 21 Reputation points
2022-11-02T08:22:28.673+00:00