Document Intelligence Layout mode table recognition

Question

Document Intelligence Layout mode table recognition

Jiaping Zhang 20 Microsoft Employee

Hello, I encountered an issue while using azure.ai.documentintelligence to parse a PDF that contains tables and text. If I directly parse the original PDF as shown in the screenshot, the table content in the first column, specifically the third data item "香肠丁" and the fourth data item "炭烤鸡腿肉片," always get recognized in the same row. However, if I extract only the table part, the parsing is correct, and the two rows of data can be separated. This issue is consistent. I would like to ask what might be causing this? Is it a limitation in the logic behind the interface, or something else? How should I handle this situation when using azure.ai.documentintelligence in the future? 2cc15a6218e540252858959577182a6

b72bc45c4f9ffa1a30931a9ce7a9b1f

Accepted answer

0 additional answers

Your answer

Answer 1

Azar 29,520 MVP Volunteer Moderator

Hi there Jiaping Zhang

Thanks for using QandA platform

so when parsing the entire document, the table detection might struggle to properly differentiate between rows, especially if the table is embedded in text or has inconsistent formatting. whicgh may cause rows to merge, but, when extracting only the table portion, the parsing works correctly because the system can focus on a cleaner, more defined structure.

So i guess the problem is related to the complexity of the layout, not a limitation. tryy preprocessing the document to separate the table from surrounding text.

If this helps kindly accept the answer thanks much.

NAGENDRAPPA, SAHANA 0 Reputation points

2024-12-02T03:48:22.96+00:00

Hi Azar, I see an issue where using the "invoice" the studio v/s SDK the results vary meaning seeing fewer number of tables in the SDK v/s in Studio. Also in SDK serializing the returned object and storing it as a JSON. Any help on this. Thank you.

Share via

Document Intelligence Layout mode table recognition

0 additional answers

Your answer