Doubts when training custom model

ricardoyepez 20 Reputation points


I am training a custom model in the form recognizer to process PDF files with more than 20 pages each, but in this case the document has at least 3 different types of structures inside it.

My idea to process these documents is to label each of the sheets in at least 5 different documents, but since it is a somewhat extensive task, I want to make sure beforehand that this is the best way or do you recommend another solution?

Thanks in advance.

Azure Form Recognizer
Azure Form Recognizer
An Azure service that applies machine learning to extract text, key/value pairs, tables, and structures from documents.
696 questions
No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 10,911 Reputation points

    @ricardoyepez Thanks, According to the fetched documents, it's recommended to create a balanced dataset that represents all the typical variations you would expect to see for the document. If your document has at least 3 different types of structures, you can consider splitting the dataset into folders and train a model for each of the variations. This way, you can train a model for each of the structures and then compose the individual models into a single composed model.