Data fetching issue in Azure OCR

Mayank Arora 0 Reputation points
2024-07-11T05:38:30.79+00:00

4-f940-chromeprinted.pdf

5 digit pin not getting fetched using form recogniser and custom extraction model in this pdf

4-f940 (1) (1).pdf

while it gets fetched in second pdf i gave

the issues come when we print pdfs as image from WPS office or by chrome.Screenshot from 2024-07-11 10-59-05 (1)

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,515 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. santoshkc 6,545 Reputation points Microsoft Vendor
    2024-07-11T08:56:21.08+00:00

    Hi @Mayank Arora,

    Thank you for reaching out to Microsoft Q&A forum!

    When tested with the prebuilt model, the 5-digit PIN is successfully fetched (see below image), indicating the issue may be with the custom extraction model.

    User's image

    So, in this case, I recommend you to train the custom extraction model with a larger variety of documents, especially those similar to the problematic PDF. This should improve the model’s ability to accurately recognize the 5-digit PIN in different formats. However, you can also use Prebuilt Layout model.

    Here are some possible causes:

    • Image Quality: Printing as an image may reduce the quality or alter the text recognition.
    • Font and Formatting Changes: Differences in font rendering and layout between the original and printed image versions can affect text extraction.
    • OCR Limitations: Optical Character Recognition (OCR) may have difficulty recognizing text in images compared to vector text.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    0 comments No comments