How to improve results for custom extraction model?

Question

Hi there,

we've been using Azure Document Intelligence for the past few weeks for recognizing answers in a scanned survey document and are still trying to get accustomed to it. At the moment we run into problems, that we simply don't know how to get rid of. The main problems are

text fields not being recognized properly
draw regions pretty much being a blackbox for us
checkbox recognition being inconsistent

1 - Text fields

This is the weirdest one. We have a question for the date of birth which we use two labels for.

User's image dobm for month and doby for year. The custom model recognizes dobm 100%. With doby however, it sometimes doesn't recognize the field at all and then for some reason supplements the value with a seemingly random text three pages later.

User's image Other times it might supplement the value with a random text on the same page. But it's always a text that has never been labeled or been part of a draw region in all the training data.

2 - Draw regions

How do they work? We put a draw region down in an area where there frequently is text in the training data and it either never recognizes the text at all or only parts of it. What is the point of creating a draw region for a big textarea when it then chooses to only read one line of it. This one is just so confusing.

3 - Checkbox recognition

Overall the checkbox recognition is actually pretty good, but the inconsistencies are just mind boggling. You can have eight questions on one page with like six checkboxes (or radio buttons) and have the model recognize the boxes for seven questions and then suddenly not recognize every box of one question at random. It doesn't even realize there is supposed to be a box there. And similar to the text fields it sometimes simply recognizes anything anywhere in the entire document as one or more of these boxes.

That's not all though. Sometimes you have a survey where the user clearly used an almost empty pen to fill out the document and you can barely recognize that they drew an X in a box. Other users miss the box completely and draw an X, Y, \ or check mark close to the box. In both situations the model still correctly recognizes those fields.

Then in another instance a user has such impeccable penmanship that it's almost a joy to read through and see how perfectly they checked several boxes only for the model to consider the clearly and perfectly checked box to be "unselected".

All of these situations are just confusing and we really don't know what else we can do. It doesn't help that the official documentation is lacking in that regard.

Further confusion

It says here https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/build-a-custom-model?view=doc-intel-2.1.0&preserve-view=true#training-data-tips-1 that it is recommended to

Use text-based PDF documents instead of image-based documents.
Use examples that have all of their fields filled in for completed forms.
Use forms with different values in each field.

for training. We've actually tried that and found that the recognition was absolutely abysmal and the results were unusable.

Furthermore we are confused by the input requirements.

"For custom extraction model training, the total size of training data is 50 MB for template model and 1G-MB for the neural model."

What does that mean? What is "1G-MB"? Is 50 the maximum, the minimum, the recommended or is it simply always 50 MB no matter what?

Is the answer to all these question just "more training data"? How can we improve and move on from this. And why is there so little information online for more complex documents? Every video tutorial you find is just barebones "how to start" or simple one page document examples. Starting is easy. The advanced uses are way more interesting.

Share via

How to improve results for custom extraction model?

1 - Text fields

2 - Draw regions

3 - Checkbox recognition

Further confusion