Parse data from an Invoice with Original and Duplicated page in the same PDF file

Francisco Távora Seruya 20 Reputation points
2024-10-30T10:24:29.8533333+00:00

Hello MSFT,

I'm looking to understand if there's a way to identify invoice files through Azure AI Document Integillence that detects inside the PDF file that the invoice has an Original and Duplicate pages.

The behavior now of the Azure Document Integillence is to duplicate each line item as many times as versions of the original invoice exist.

Although the InvoiceTotal, SubTotal, and other header invoice details are correct, we will ultimately receive more items that don't match the totals.

Thanks

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,709 questions
0 comments No comments
{count} votes

Accepted answer
  1. kothapally Snigdha (Quadrant Resource LLC) 260 Reputation points Microsoft Vendor
    2024-10-30T17:18:29.9633333+00:00

    Hi Francisco Távora Seruya

    Thanks for Reaching the Microsoft Q&A Forum.

    You can train a custom model to recognize and differentiate between the original and duplicate pages. This involves labeling a set of sample invoices to teach the model how to identify these pages correctly.

    Ensure this feature is activated in your API configurations. This will help capture labels such as "Original" or "Duplicate" from the text within the invoice.

    After receiving the extracted data, check for the presence of Original or Duplicate in the extracted text and adjust your data structure accordingly to avoid duplicating line items.

    Collect samples of invoices that clearly indicate which pages are originals and which are duplicates. Use these labeled examples to train your model, improving its ability to recognize these distinctions.

    • After extracting data from invoices, implement a processing in your application to handle potential duplicates.
    • After receiving the extracted data, check for the presence of Original or Duplicate in the extracted text and adjust your data structure accordingly to avoid duplicating line items.

    Hope this helps. Do let us know if you any further queries.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    Thank you! 

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.