Hello jum,
Yes. When every form family has its own layout quirks, the recommended pattern is to train a separate custom neural model for each family and then compose them into a single endpoint. The Compose feature performs an automatic “best fit” classification at run-time and currently supports up to 200 child models per composed model.
Why the table with “numbers or check-boxes” is tricky
- Custom neural v4.0 understands three data types: key-value pairs (string/number/date), selection-marks, and tabular fields. It does not let the same field switch type between samples.
- Inside a table every cell value is ultimately treated as text. If a user draws a check-box the Layout engine emits the Unicode symbols ☒ / ☐ (selected / unselected).
When your fields.json
says the column is string
, cells that contain only a selection-mark end up as empty strings, so the model gets no signal and accuracy drops.
Pattern that works well in production
Step | What to do | Why it helps |
---|---|---|
1 – Pick a single field type per column | Either keep the column as string and treat “☒” as “1” in post-processing, or split the logic into two separate fields (qtyNumber, qtyChecked). | Keeps the label schema consistent so the network converges. |
1 – Pick a single field type per column | Either keep the column as string and treat “☒” as “1” in post-processing, or split the logic into two separate fields (qtyNumber , qtyChecked ). |
Keeps the label schema consistent so the network converges. |
2 – Label both variants | Include at least 5 docs with numbers and 5 docs with check-boxes in the same training set. | Lets the neural model learn both visual patterns. |
3 – If you must keep the table semantic but also detect the mark | Use the new overlapping-fields capability (v4.0, 2024-11-30 API): label the cell once as part of the table and overlay a selection field on the same tokens. Remember the limits – two overlapping fields max and they can’t span multiple rows |
Provides the best of both worlds: structured rows plus a boolean field you can map to “quantity = 1”. |
4 – Fixed vs. dynamic table | If the column/row count never changes, choose Fixed Table; otherwise label as Dynamic Table so variable row counts don’t hurt recall. | Gives the model the right structural prior. |
5 – Post-process for business meaning | Convert - number ⇒ quantity - selected mark ⇒ quantity = 1 - empty cell ⇒ quantity = 0 | Keeps the extraction model generic while business rules live in code. |
Does adding more labels help?
Only if they follow the rules above. Simply mixing “123” and “☒” in the same field without consistent typing will keep the network confused and you will continue to see empty cells for the mark variant.
Quick checklist
Schema – verify every field in fields.json
has exactly one type
.
Samples – balanced dataset: ≥ 5 pages per visual variant (numbers vs. marks).
- API version – train and run with 2024-11-30 (v4.0 GA) so that tabular fields, cell-confidence and overlapping-fields are all enabled.
Compose rebuilt – after re-training a child model, remember to re-compose it so the new extractor is available through the composed model.
TL;DR
Your composed-model architecture is solid. For the “quantity or check-box” column either (a) treat everything as text and map “☒” later, or (b) split the information into two fields using the v4.0 overlapping-fields feature. What you cannot do is let one field randomly switch between string and selection-mark across documents — Document Intelligence will treat the mismatched samples as “missing” and accuracy will suffer.
Hope that clears things up!
Best Regards,
Jerald Felix