Text content missing from the returned PageResult in Java sdk

Question

Hi, when testing computer vision java sdk with two similar pdf files (one Englist, one French). Some string visible in the pdf files are not reported by PageResult when testing with one version of pdf, although they read properly from another version of pdf file. It is assumed all visible strings should be reported in PageResult object. For example, at the end of the each page, there is a file version of 3885A (11/20) in both version of pdf files. But the computer vision Java sdk only returns this string when testing with v1.pdf, but not with v2.pdf. Could someone help on this issue and find out why some strings are missing even if the strings are visible in pdfÉ v1.pdf and v2.pdf are attached for reference. Thanks Jonathan[78822-v1.pdf][1][78823-v2.pdf][2] [1]: /api/attachments/78822-v1.pdf?platform=QnA [2]: /api/attachments/78823-v2.pdf?platform=QnA

Answer

Thanks for reaching out to us, but I can not open your pdf file. Could you please upload again?

And 2 products I will recommend if you are trying to extract text from PDF

Thanks.

Regards,
Yutong

Text content missing from the returned PageResult in Java sdk

1 answer