Doubts when training custom model

Question

Hello,

I am training a custom model in the form recognizer to process PDF files with more than 20 pages each, but in this case the document has at least 3 different types of structures inside it.

My idea to process these documents is to label each of the sheets in at least 5 different documents, but since it is a somewhat extensive task, I want to make sure beforehand that this is the best way or do you recommend another solution?

Thanks in advance.

Answer

@ricardoyepez Thanks, According to the fetched documents, it's recommended to create a balanced dataset that represents all the typical variations you would expect to see for the document. If your document has at least 3 different types of structures, you can consider splitting the dataset into folders and train a model for each of the variations. This way, you can train a model for each of the structures and then compose the individual models into a single composed model.

Share via

Doubts when training custom model

1 answer