Slow performance - size limitation?

MeghanZiegler-1345 0 Reputation points

Hi, I'm using a custom neural model to extract tables from pdfs that are about 50 pages long. One of the tables is quite large and spans 20+ pages, so the table has well over 1000 rows. The problem is that the labeling process becomes very slow after the table has 500 or so rows. After 1000, it can take a few seconds or even 20 seconds at times to insert, edit, or label a row of data in the table.

Is there a limitation to the number of rows a table can have? Or is there a Best Practice for this type of issue? I have tried editing the pdf to limit the number of pages, but this leads to poor test accuracy because each page can have uniquely different information listed in the table.

I am using the latest API version released on 2024-02-29 and have tried it on the 2023-10-31 preview as well.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,522 questions
{count} votes

1 answer

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 15,851 Reputation points

    Hello @MeghanZiegler-1345 , there is a feature which can help to auto-label the table fields to reduce the manual effort.  Can you try this feature and see if this can help the labeling efficiency?  

    1. For each page, add/create the "Table" field first as usual to define the field info.
    2. After "Run layout" is done, a "table" mark will show up. Click on the icon/mark.
    3. A table with the recognized content will show up.  Scroll to the bottom and click "Auto label" and choose the table field accordingly.
    4. A "Table labeling" wizard will show up and you can follow the steps to edit and review the rows/columns accordingly.

    Here are the Label tips and tricks for Document Intelligence Studio:

    0 comments No comments