Extract data from multi page pdf

Rahul Sharma 0 Reputation points
2024-11-10T14:31:28.8+00:00

Hi Experts,

I am using Document Intelligence Custom Model to extract data from invoices.

The file can be a multi page pdf. Each page represents a separate invoice. I want to extract data from each page. Eg. amount field is present on page1, as well as on page 2.

I am unable to label it using Document Intelligence studio as 1 field can be labelled at only 1 place.

Please suggest how can I label and extract data for my usecase.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,771 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Saideep Anchuri 415 Reputation points Microsoft Vendor
    2024-11-11T06:36:07.33+00:00

    Hi Rahul Sharma

    Welcome to Microsoft Q&A Forum, thank you for posting your query here!

    Use the Layout API to examine the structure of each PDF page. This method helps in identifying the regions where the amount field is located on each page. Furthermore, implement a custom neural model that supports overlapping fields and can handle documents with the same content but different page structures. This model can be trained to identify and extract the amount field from each page.

    Kindly find the below document for reference:

    https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/train/custom-neural?view=doc-intel-3.1.0&viewFallbackFrom=doc-intel-4.0.0

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer

    Thank You.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.