Share via

Azure document intelligence custom extraction model merging adjacent cells and glitching.

Rahees Khan 25 Reputation points
2025-04-15T06:53:14.49+00:00

Using custom extraction neural model creating a solution by extracting data and utilizing it.

I have trained model with twenty plus consistent documents and using latest API version 2024-11-30.

But still cell merge issue is witnessed and this ruins the logics of system/solution, please see attached.

Screenshot 2025-04-15 at 10.36.52 AM.png

I am aware Microsoft team has already highlighted this limitation and product is also aligned with it, but that thread was around an year old as per below reference where it is assured to address, but still no update there.

https://learn.microsoft.com/en-us/answers/questions/1661735/document-intelligence-custom-extraction-model-merg?page=1&orderby=helpful&translated=false&source=docs

Any help or suggestions are greatly appreciated.

Azure Document Intelligence in Foundry Tools

Answer accepted by question author

Vinodh247-1375 43,021 Reputation points Volunteer Moderator
2025-04-15T17:18:20.12+00:00

Hi ,

Thanks for reaching out to Microsoft Q&A.

Yes, this issue is still a known limitation with Azure Document Intelligence (formerly Form Recognizer) when using Custom Extraction (Neural) models especially when working with semi-structured tables like the one in your screenshot. Here is a breakdown of the situation and potential workarounds:

Root of the Problem

Document Intelligence merges adjacent cells when:

Table borders are unclear or inconsistent.

  The model misinterprets rows or columns due to layout variations.
  
     Slight misalignment or text proximity is mistaken for a single cell.
     
     Even though you trained on consistent templates, runtime predictions can still glitch due to:
     
        Minor formatting variations.
        
           OCR-level misreadings.
           
              Inherent table understanding limitations of the current neural model.
              

Current Status

  • The GitHub/Microsoft Q&A thread you referred to is accurate, this has been acknowledged by Microsoft.
  • The 2024-11-30 API version does not yet resolve this behavior.
  • No public documentation or roadmap currently confirms if or when fine-grained table cell delineation improvements will be released.

Suggested Strategy

Since your use case is solution-focused and logic-heavy:

  • Add postprocessing heuristics into your pipeline to validate the expected number of columns.
  • When merged cells are found, split using text content rules (date pattern, status keywords).
  • Train an auxiliary ML model (using spaCy or custom NER) to tag known fields from text blocks, instead of relying on strict table fidelity.

Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

Was this answer helpful?


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.