Doubts when training custom model

ricardoyepez 20 Reputation points


I am training a custom model in the form recognizer to process PDF files with more than 20 pages each, but in this case the document has at least 3 different types of structures inside it.

My idea to process these documents is to label each of the sheets in at least 5 different documents, but since it is a somewhat extensive task, I want to make sure beforehand that this is the best way or do you recommend another solution?

Thanks in advance.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,430 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,621 Reputation points

    @ricardoyepez Thanks, According to the fetched documents, it's recommended to create a balanced dataset that represents all the typical variations you would expect to see for the document. If your document has at least 3 different types of structures, you can consider splitting the dataset into folders and train a model for each of the variations. This way, you can train a model for each of the structures and then compose the individual models into a single composed model.

    0 comments No comments