Azure form recognizer custom template model is not extracting the content properly from the scanned PDF.

Question

Azure form recognizer custom template model is not extracting the content properly from the scanned PDF. My training data set contains totally 10 PDF. Each PDF is in different language but visually looks same. But the output is not proper when I tested using the single custom model. In this case, which model should I choose and what's the step I have to follow in azure form recognizer for better content extraction.

Accepted Answer

Thameemul Ansari Ideally, if the results are not as expected for a template model you could add more files for training and train a new model for better accuracy. In this case though since you are using only a small dataset of multiple languages the model might not have enough data to train on. I would recommend using multiple models trained on same language forms and then use these models under one composed model id. This should ensure each individual model to have more training data to train and extract the text exactly as per the OCR regions it detects during training. Having a single composed model ensures you receive results of the model that scores the highest during analyze operation. Also, the language code is optional for the latest version of the API so it should be fine to not pass the code in the request and if the language is supported and the model is trained for that language code the extraction results should improve. I hope this helps!!

-Please kindly accept the answer if the answer was helpful to support the community, thanks.

Azure form recognizer custom template model is not extracting the content properly from the scanned PDF.

0 additional answers