PDF with korean standard font not recognized

Martin P 0 Reputation points
2024-01-30T10:58:14.41+00:00

Hi, I am having trouble with pdfs containing korean text. This korean text is using two different fonts. One of those is embedded into the pdf, the other seems to be a well-known standard font family called "Gulim", which is not embedded. To even see the text locally, I had to install the supplemental korean font package (windows optional feature). Both the prebuilt-read and prebuilt-invoice models (in all api versions) read the embedded font just fine, but the text using the standard font is completely missing. Is there anything I can do about this? Do I have to enable CJK fonts somewhere for this to work? As a workaround, I have identified that a rendered version of the pdf does indeed work, but I would prefer using pdfs due to the smaller file size, faster processing time, and since I also use location data in physical dimensions, I do not have to convert all points from pixels to inches.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,216 questions
{count} votes