KarimKhelifi-0642 avatar image
0 Votes"
KarimKhelifi-0642 asked romungi-MSFT commented

How to improve table recognition in FormRecogizer ?

Hello, I want to use FormRecognizer with custom template model to handle this table:

FormRecognizer Studio's table tool recognizes this:

We can see that some cells are correctly recognized, while others are joined together. Some empty cells are recognized as checkboxes, but this is not an issue currently.

I'm trying to find ways to improve the recognition and split the B's so that each is recognized in its own cell. What I tried so far is to label each cell individually by drawing a region for each (as suggested here improve-table-recognition). Besides being a very tedious and boring task (my table has 20 rows x 28 columns), this did not help. FormRecognizer does not seem to use them and still joins the B's together.

My next move would be to do some image preprocessing with an image processing tool such as OpenCV to present the page in a more convenient way to help FormRecognizer. But this, imho, would defeat its very purpose!

Any ideas on how I could improve the recognition ?


image1.png (460.1 KiB)
image2.png (43.1 KiB)
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@KarimKhelifi-0642 I would prefer using the Layout API if a form contains only a table that needs to be extracted. The same image with the layout API does provide slightly better results but still needs improvement. I think this is because of the quality of the image, possible reasons could be the highlighting(OCR) in the image that is already done and a slightly better resolution will pickup the headers accurately. Could you try a better quality image with the layout API and see if you see favorable results?


0 Votes 0 ·
image.png (291.4 KiB)

@KarimKhelifi-0642 Did you get a chance to try the above suggestion with a different image?

0 Votes 0 ·

0 Answers