Form Recognizer Handle Multiple of Same Form in PDF

Eden Corbin 66 Reputation points
2021-02-28T13:19:47.727+00:00

I'm having good success with Form Recognizer, but am not understanding how pages work. I have a one page form, but I want to allow my users to upload a PDF with multiple versions / scans of this form. It allows me to submit a 2 page document (same form different values on page 1 and page 2) It detects multiple pages, however I only get the values from the first page / form. I'm considering using a tool to split the pdf into seperate pages, but wondering if this is necessary, it would be great if Form Recognizer could analyze all the pages as unique copys of the form and I could parse the results.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,475 questions
0 comments No comments
{count} votes

Accepted answer
  1. Ramr-msft 17,641 Reputation points
    2021-03-01T18:06:37.22+00:00

    @Eden Corbin Thanks for the question. Currently this feature does not exist and we have forwarded this feedback to our product team. You can also raise a user voice request here so the community can vote and provide their feedback, the product team then checks this feedback and implements the feature in future releases.

    For Input requirements:
    https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/overview?tabs=v2-1#input-requirements


6 additional answers

Sort by: Most helpful
  1. NickFos80 1 Reputation point
    2021-04-26T08:42:13.287+00:00

    I've hit the same issue as the original poster - we've got a lot of multi-page tables where we have header labels on the top of the first page, then a table which continues onto a second or third page. The columns headers are the same on the continuation pages, but the values obviously change. It's a lot of extra work to then recombine the outputs from several calls to the API for the split out pages, back into one output file that represents the original input data.
    It would be good when training the model to to have an option to 'add more rows from next page' or something similar, so that all the table data can be extracted together for a multi-page pdf table.


  2. Søren Lyck Jensen 1 Reputation point
    2021-06-15T09:55:19.64+00:00

    I have the same problem, this feature will be highly desired

    0 comments No comments

  3. George Gregory 1 Reputation point
    2021-06-24T14:49:50.637+00:00

    looking for the same thing.

    0 comments No comments

  4. Christian Wellnitz 61 Reputation points
    2021-09-14T20:38:37.713+00:00

    If you know the forms are on each page, you can simply use a pdf splitter and recognize them individually.

    But would be nice to have that split functionality built into forms recognizer :)

    0 comments No comments