Form Recognizer Studio - Auto Label Multipage tables

Paul Andrews 15 Reputation points
2023-06-01T18:23:57.7366667+00:00

I am enjoying the power of the auto-label functionality to automatically label large tables on my training data. However, one shortcoming I have noticed with no obvious solution or workaround is that it doesn't seem to be possible to auto-label multiple pages of the same table on my document to the same table in the template.

My training documents contain tables that often span multiple pages. However, Form Recognizer sees these tables as separate from each other when it does its initial scan. When performing the table auto label operation, you are required to click a checkbox that says "I understand that the entire table will be overwritten by the above content" before submitting the operation. Why is this the case? I think there should be some additional functionality to append the results to an existing table. The way it currently works only allows me to auto-label one page at a time, deleting any progress I've already made on other pages. Is there currently a way to achieve this effect?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,522 questions
{count} vote

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 15,851 Reputation points
    2023-06-02T01:18:42.2033333+00:00

    Hi @Paul Andrews , Thanks for using Microsoft Q&A Platform.

    Unfortunately, the auto-label table functionality in Form Recognizer currently only supports single-page tables.

    Generally, if the layout table extracts the result you need, you can skip the labeling process. If it isn't exactly what you need, then select the auto label button to edit the values as needed. https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-custom-label-tips?view=form-recog-3.0.0#auto-label-tables

    Regarding the checkbox that says, "I understand that the entire table will be overwritten by the above content", this is because the auto-label functionality is designed to label the entire table on a single page. If you have already labeled part of the table on another page, selecting this checkbox will overwrite that progress.

    I believe there is currently no way to append the results to an existing table, you can label each page of the table separately.

    To accurately extract data from tables that span multiple pages, label them as a single table and include documents with the table on a single page and documents with the table spanning multiple pages with all rows labeled in the training dataset. Refer to this link: https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-custom-label?view=form-recog-3.0.0#create-a-balanced-dataset

    I hope this helps! Let me know if you have any other questions.

    Regards,
    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

    0 comments No comments