question

14286944 avatar image
0 Votes"
14286944 asked ramr-msft edited

Form Recognizer: how to analyze a form already split in multiple files (each per page) from the same multipage form

Hello, I have a form to analyze provided in three .jpg files, corrisponding each to a single page of a multipage form. I was wondering if is there possible to feed Form Recognizer with these three separated files to check if they match a label based model and extract data from them. Alternatively I was considering to merge those jpg in a single pdf with a java lib, but this would be the last chance. Thank You

azure-form-recognizer
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@14286944 Thanks for the question, Can you please add more details about the label that you are trying to match from the three separated files. Also please add details about the type of the model that you are trying.

0 Votes 0 ·

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered ramr-msft edited

@14286944 Thanks, If you are extracting only text, tables and selection marks from documents you should use layout, if you also need to extract key value pairs you can train a custom model or use a pre-built (Invoice, Receipts, Business Cards).

Layout results (text, tables and selection marks) are included in all the Analyze outputs (custom and pre-built) in the readResults (text) and pageResults (tables) of the JSON output.

• Layout – extract text, tables selection marks no training required
• Pre-built – Invoice, Receipts, Business Cards – extract values of interest from these type of documents
• Custom – Extract key value pairs trained on your own documents
All of the above will also include the text, tables and selection marks in the results.

check out the Knowledge Extraction Recipes resource for different use cases for post processing. https://github.com/microsoft/knowledge-extraction-recipes-forms


· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello,
I am working on a custom model with labels, but my simple problem is if I need to join the separate files or I can feed those multiple files in form recognizer to be analyzed with the model.

Thank You

0 Votes 0 ·

@14286944 Thanks for the details. Currently Form Recognizer custom models train on your own data, and you only need five sample input forms to start. A trained document processing model can output structured data that includes the relationships in the original form document. After you train the model, you can test and retrain it and eventually use it to reliably extract data from more forms according to your needs.


0 Votes 0 ·