Azure document intelligence custom extraction model merging adjacent cells and glitching.

Question

Azure document intelligence custom extraction model merging adjacent cells and glitching.

Rahees Khan 25

Using custom extraction neural model creating a solution by extracting data and utilizing it.

I have trained model with twenty plus consistent documents and using latest API version 2024-11-30.

But still cell merge issue is witnessed and this ruins the logics of system/solution, please see attached.

Screenshot 2025-04-15 at 10.36.52 AM.png

I am aware Microsoft team has already highlighted this limitation and product is also aligned with it, but that thread was around an year old as per below reference where it is assured to address, but still no update there.

https://learn.microsoft.com/en-us/answers/questions/1661735/document-intelligence-custom-extraction-model-merg?page=1&orderby=helpful&translated=false&source=docs

Any help or suggestions are greatly appreciated.

Anonymous

2025-04-16T04:14:28.51+00:00

Hello Rahees Khan,

I hope you had the chance to review the response provided by Vinodh247 and found it effective in addressing your concern.

Thank you!

Answer accepted by question author

0 additional answers

Your answer

Anonymous

2025-04-16T04:14:28.51+00:00

Hello Rahees Khan,

I hope you had the chance to review the response provided by Vinodh247 and found it effective in addressing your concern.

Thank you!

Answer 1

Hi ,

Thanks for reaching out to Microsoft Q&A.

Yes, this issue is still a known limitation with Azure Document Intelligence (formerly Form Recognizer) when using Custom Extraction (Neural) models especially when working with semi-structured tables like the one in your screenshot. Here is a breakdown of the situation and potential workarounds:

Root of the Problem

Document Intelligence merges adjacent cells when:

Table borders are unclear or inconsistent.

  The model misinterprets rows or columns due to layout variations.
  
     Slight misalignment or text proximity is mistaken for a single cell.
     
     Even though you trained on consistent templates, runtime predictions can still glitch due to:
     
        Minor formatting variations.
        
           OCR-level misreadings.
           
              Inherent table understanding limitations of the current neural model.

Current Status

The GitHub/Microsoft Q&A thread you referred to is accurate, this has been acknowledged by Microsoft.
The 2024-11-30 API version does not yet resolve this behavior.
No public documentation or roadmap currently confirms if or when fine-grained table cell delineation improvements will be released.

Suggested Strategy

Since your use case is solution-focused and logic-heavy:

Add postprocessing heuristics into your pipeline to validate the expected number of columns.
When merged cells are found, split using text content rules (date pattern, status keywords).
Train an auxiliary ML model (using spaCy or custom NER) to tag known fields from text blocks, instead of relying on strict table fidelity.

Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

Rahees Khan 25 Reputation points

2025-04-16T05:51:37.1033333+00:00

Thanks @Vinodh247-1375 for well informed and structured answer.

Share via

Azure document intelligence custom extraction model merging adjacent cells and glitching.

0 additional answers

Your answer