An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
Hi ,
Thanks for reaching out to Microsoft Q&A.
Yes, this issue is still a known limitation with Azure Document Intelligence (formerly Form Recognizer) when using Custom Extraction (Neural) models especially when working with semi-structured tables like the one in your screenshot. Here is a breakdown of the situation and potential workarounds:
Root of the Problem
Document Intelligence merges adjacent cells when:
Table borders are unclear or inconsistent.
The model misinterprets rows or columns due to layout variations.
Slight misalignment or text proximity is mistaken for a single cell.
Even though you trained on consistent templates, runtime predictions can still glitch due to:
Minor formatting variations.
OCR-level misreadings.
Inherent table understanding limitations of the current neural model.
Current Status
- The GitHub/Microsoft Q&A thread you referred to is accurate, this has been acknowledged by Microsoft.
- The 2024-11-30 API version does not yet resolve this behavior.
- No public documentation or roadmap currently confirms if or when fine-grained table cell delineation improvements will be released.
Suggested Strategy
Since your use case is solution-focused and logic-heavy:
- Add postprocessing heuristics into your pipeline to validate the expected number of columns.
- When merged cells are found, split using text content rules (date pattern, status keywords).
- Train an auxiliary ML model (using spaCy or custom NER) to tag known fields from text blocks, instead of relying on strict table fidelity.
Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.