An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
Hi ,
Thanks for reaching out to Microsoft Q&A.
Content Understanding will not reliably infer parentchild relationships when the grouping key (delivery note) is embedded inside the description or appears across page breaks. Prebuilt models handle layout reasonably, but custom schemas break far sooner because they enforce structure that simply is not present in the document.
Expanded:
- Grouping by delivery note inside CU is unreliable The model cannot reliably segment table “sub-groups” when the only signal is free text inside descriptions, particularly spanning pages. Labeled data helps detect fields, but it does not teach the model to interpret implicit grouping logic.
- Extracting delivery notes separately + using coordinates is valid Using line bounding boxes and Y position logic server-side is the most reliable method today. Essentially:
- OCR -> extract lines + positions
Track the last seen delivery note above the row
Assign it to subsequent rows until another note appears
This works even across page boundaries.
- Prebuilt + custom is a real trade-off Prebuilt invoice will scale better on long documents because it leans on layout heuristics and dynamic tables instead of strict schemas. Running a second custom model for additional fields is pragmatic, and yes it costs more, but it saves you from schema failures.
Alternative options if scale becomes painful
- Split pages and process in batches, then recombine logic yourself
Switch to Azure Document Intelligence with the layout model + your own post-processing
Build a small ML/NLP classifier to detect lines containing delivery note references
You are exactly on the right track. Treat CU like a text extractor + entity detector, and handle grouping logic yourself. Trying to force the model to infer grouping rules the document does not explicitly encode is a dead end.
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.