Need help on table extraction from a document

Question

Hi All,

We are having tables in a document, we need to extract it. We are invoking form recognizer using REST API, and in the OCR response, we are getting an element named tables, with all cell details. Can you please let me know if the only way to extract data from tables is to parse the tables element cell by cell and read the data, or do we have any other method to read and extract the table directly?

If in case the borders of the cells are not clearly visible, then we are sometimes seeing the segregation is missing in the extracted results, all the data is extracted as a single cell element. Please let me know if there are any ways to avoid this issue. Thanks

Accepted Answer

Hi @Sriramsubramaniyan Nadarajan,

Thank for your query regarding the table extraction. To extract data from tables in the OCR response of Form Recognizer or Document Intelligence, you typically need to parse the tables element cell by cell. While Form Recognizer provides detailed cell information, there isn't a direct method to extract the entire table at once without parsing each cell. However, you can simplify this process by using prebuilt models or training custom models to handle specific structured content, which can help map headers and organize the data more efficiently.

For more info see: Analyze document table extraction.

To avoid issues with unclear cell borders causing data to be extracted as a single cell, ensure your tables have clearly defined borders and sufficient spacing. Preprocessing documents to enhance table lines or using higher resolution scans can help. Training a custom model with Form Recognizer tailored to your document layout can also improve table detection and data extraction accuracy.

See the page: Input requirements

For best practice to achieve higher accuracy, see: Ensure high model accuracy.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Answer

Hi @santoshkc ,

Thanks a lot for sharing all the details. It is very helpful.

Can you please share a sample code to extract the cells from the table, if possible. Thanks

Share via

Need help on table extraction from a document

1 additional answer

Your answer