Issues with Auto-Labeling and Table Extraction in Custom Extraction Model for Electricity and Gas Bills in Azure Document Intelligence

Shima Yousefi 0 Reputation points
2024-07-12T08:38:43.23+00:00

I am using a custom extraction model in document intelligence to train on electricity and gas bills for data extraction. However, there are many templates for each supplier, and I am using auto-labeling to speed up the process. The issue is that after testing the templates, the auto-labeling does not perform well. When I test the trained model on the table of energy charges extraction, it merges two columns or mistakenly takes other data as a row, resulting in an inaccurate table extraction. What should I do to resolve this issue? Is there a way to adjust the confidence score in the system and improve it?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,540 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,650 questions
{count} votes

2 answers

Sort by: Most helpful
  1. santoshkc 6,955 Reputation points Microsoft Vendor
    2024-07-12T10:28:28.6966667+00:00

    Hi @Shima Yousefi,

    Thank you for reaching out to Microsoft Q&A forum!

    To address the issues with auto-labeling and table extraction accuracy in your custom extraction model for electricity and gas bills in Azure Document Intelligence:

    Ensure you manually refine auto-labeling results to correct inaccuracies. However, adjusting confidence scores directly is not currently possible. To enhance model performance and score, consider training it with more documents.

    I hope you understand! Thank you.

    0 comments No comments

  2. Azar 22,510 Reputation points MVP
    2024-07-12T11:35:33.68+00:00

    Hi there Shima Yousefi

    Thanks for using QandA platform

    Review and correct auto-labeled data, male sure clear boundaries for columns and rows. retrain the model with additional samples and use data augmentation. Implement post-processing logic to validate and correct extracted data, and adjust confidence score thresholds to balance precision and recall. Set up a feedback loop for continuous improvement and incorporate user feedback for manual corrections.

    if this helps kindly accept the response thanks.