Recognizing multiple rows content without horizontal seperators

Rahul Kottath 6 Reputation points
2022-01-25T14:01:24.37+00:00

Hi,

I am training a custom model to detect the table contents from documents. Even after training the model is unable to detect multirow content. Also in some rows, there are vacant spaces. I want the form recognizer to detect them as a blank cells. So in my case, each row is split into a maximum of 3 sub rows, but in some cases, there is content in all the sub rows otherwise there will be content available in only some of the sub rows. A sample figure is shown below.

I have tried labeling blank cells using the draw region option in the form recognizer. With this, I am getting very poor performance. How can I improve the performance of the model?

168230-snapshot.png

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,500 questions
{count} vote

1 answer

Sort by: Most helpful
  1. GiftA-MSFT 11,161 Reputation points
    2022-01-31T17:15:23.513+00:00

    Hi, following up. Into what structure are you labeling these tables ? From a quick test, Form Recognizer seems to extract these tables nicely out of the box with no training required. It might be easier to use the out of the box tables and add a post processing step to standardize the data into the format you need for the downstream workflow. Have you tried that?

    169925-image.png