Document Intelligence Layout mode table recognition

Jiaping Zhang 20 Reputation points Microsoft Employee
2024-11-22T08:15:43.84+00:00

Hello, I encountered an issue while using azure.ai.documentintelligence to parse a PDF that contains tables and text. If I directly parse the original PDF as shown in the screenshot, the table content in the first column, specifically the third data item "香肠丁" and the fourth data item "炭烤鸡腿肉片," always get recognized in the same row. However, if I extract only the table part, the parsing is correct, and the two rows of data can be separated. This issue is consistent. I would like to ask what might be causing this? Is it a limitation in the logic behind the interface, or something else? How should I handle this situation when using azure.ai.documentintelligence in the future?2cc15a6218e540252858959577182a6

b72bc45c4f9ffa1a30931a9ce7a9b1f

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,122 questions
0 comments No comments
{count} votes

Accepted answer
  1. Azar 29,520 Reputation points MVP Volunteer Moderator
    2024-11-22T08:46:04.46+00:00

    Hi there Jiaping Zhang

    Thanks for using QandA platform

    so when parsing the entire document, the table detection might struggle to properly differentiate between rows, especially if the table is embedded in text or has inconsistent formatting. whicgh may cause rows to merge, but, when extracting only the table portion, the parsing works correctly because the system can focus on a cleaner, more defined structure.

    So i guess the problem is related to the complexity of the layout, not a limitation. tryy preprocessing the document to separate the table from surrounding text.

    If this helps kindly accept the answer thanks much.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.