Issues with Auto-Labeling and Table Extraction in Custom Extraction Model for Electricity and Gas Bills in Azure Document Intelligence

Question

I am using a custom extraction model in document intelligence to train on electricity and gas bills for data extraction. However, there are many templates for each supplier, and I am using auto-labeling to speed up the process. The issue is that after testing the templates, the auto-labeling does not perform well. When I test the trained model on the table of energy charges extraction, it merges two columns or mistakenly takes other data as a row, resulting in an inaccurate table extraction. What should I do to resolve this issue? Is there a way to adjust the confidence score in the system and improve it?

Answer

Hi @Shima Yousefi,

Thank you for reaching out to Microsoft Q&A forum!

To address the issues with auto-labeling and table extraction accuracy in your custom extraction model for electricity and gas bills in Azure Document Intelligence:

Ensure you manually refine auto-labeling results to correct inaccuracies. However, adjusting confidence scores directly is not currently possible. To enhance model performance and score, consider training it with more documents.

I hope you understand! Thank you.

Answer

Hi there Shima Yousefi

Thanks for using QandA platform

Review and correct auto-labeled data, male sure clear boundaries for columns and rows. retrain the model with additional samples and use data augmentation. Implement post-processing logic to validate and correct extracted data, and adjust confidence score thresholds to balance precision and recall. Set up a feedback loop for continuous improvement and incorporate user feedback for manual corrections.

if this helps kindly accept the response thanks.

Share via

Issues with Auto-Labeling and Table Extraction in Custom Extraction Model for Electricity and Gas Bills in Azure Document Intelligence

2 answers