Recognizing multiple rows content without horizontal seperators

Question

Hi,

I am training a custom model to detect the table contents from documents. Even after training the model is unable to detect multirow content. Also in some rows, there are vacant spaces. I want the form recognizer to detect them as a blank cells. So in my case, each row is split into a maximum of 3 sub rows, but in some cases, there is content in all the sub rows otherwise there will be content available in only some of the sub rows. A sample figure is shown below.

I have tried labeling blank cells using the draw region option in the form recognizer. With this, I am getting very poor performance. How can I improve the performance of the model?

Answer

Hi, following up. Into what structure are you labeling these tables ? From a quick test, Form Recognizer seems to extract these tables nicely out of the box with no training required. It might be easier to use the out of the box tables and add a post processing step to standardize the data into the format you need for the downstream workflow. Have you tried that?

Share via

Recognizing multiple rows content without horizontal seperators

1 answer