Is there any way to improve the value results of the azure document intelligence by editing the values in the custom built model? Can I enhance it ?

Question

Is there any way to improve the value results of the azure document intelligence by editing the values in the custom built model? Can I enhance it ?

Johnimmanuel Vetri 20

Some of the values extracted by document intelligence studio were not appropriate. Can I improve the accuracy of the values extracted by training?

Johnimmanuel Vetri 20 Reputation points

2024-08-13T07:54:44.09+00:00

Okay, Thanks for your reply @dupammi . What i'm concerned about is in one of my document some of the values are printed just above the line. So most of the values provided by DI were inaccurate for example 4 was taken as A, R was taken as B and 1 was taken as I similalrly many values. Will i be able to improve this in the custom built model of DI studio?
dupammi 8,615 Reputation points Microsoft External Staff

2024-08-13T08:23:52.3366667+00:00

Hi @Johnimmanuel Vetri

Thank you for following up with us.

To address this issue, you might consider using Custom models. In Azure Document Intelligence Studio, you can train a custom model to accurately recognize the specific VIN numbers that are currently being misread in your document. This will help ensure correct identification in the future.

I suggest trying the Custom extraction template model, which should be able to correctly read the capital letter 'I' based on my previous experience with this model.

Alternatively, you can explore pre-built models designed to extract information from various document types. Testing these models might also help in accurately identifying the VIN numbers.

For optimal results, consider providing more documents to train the model according to these specific input requirements.

I hope this helps!

1 answer

Your answer

Johnimmanuel Vetri 20 Reputation points

2024-08-13T07:54:44.09+00:00

Okay, Thanks for your reply @dupammi . What i'm concerned about is in one of my document some of the values are printed just above the line. So most of the values provided by DI were inaccurate for example 4 was taken as A, R was taken as B and 1 was taken as I similalrly many values. Will i be able to improve this in the custom built model of DI studio?
dupammi 8,615 Reputation points Microsoft External Staff

2024-08-13T08:23:52.3366667+00:00

Hi @Johnimmanuel Vetri

Thank you for following up with us.

To address this issue, you might consider using Custom models. In Azure Document Intelligence Studio, you can train a custom model to accurately recognize the specific VIN numbers that are currently being misread in your document. This will help ensure correct identification in the future.

I suggest trying the Custom extraction template model, which should be able to correctly read the capital letter 'I' based on my previous experience with this model.

Alternatively, you can explore pre-built models designed to extract information from various document types. Testing these models might also help in accurately identifying the VIN numbers.

For optimal results, consider providing more documents to train the model according to these specific input requirements.

I hope this helps!

Answer 1

Hi @Johnimmanuel Vetri

Thank you for using the Microsoft Q&A forum.

Yes, you can improve the accuracy of the values extracted by Azure Document Intelligence by retraining your custom model. Here are some steps to enhance your model’s performance:

Edit Labeled Data:

Open your project in Document Intelligence.
- Go to the "Label data" section to view and edit the existing labels or upload new documents with corrected labels.

Increase Training Data:

Add more documents to your training dataset, ensuring you cover all variations of the document types you want to analyze.
- Use a mix of text-based and high-quality scanned PDFs for training.

Improve Data Quality:

Ensure high-quality input documents and consistent formatting.
- Ensure that all fields in the training documents are correctly filled in and labeled.

Optimize Labels and Annotations:

Ensure accurate and consistent labeling of fields in the training documents.
- Avoid extraneous labels and ensure that signature and region labeling does not include surrounding text.

Incorporate Human Review:

Implement a human review step in your workflow for critical documents to manually correct any errors and improve overall accuracy.

Retrain the Model:

After making changes to the labels and adding more training data, retrain the model to incorporate these updates.
- Test the model thoroughly after each training session to ensure that improvements in one area do not negatively affect others.

Utilize Confidence Scores:

Review the confidence scores for extracted values and focus on improving low-confidence areas by refining the training data.
- For critical fields, require human review for low-confidence results.

Address Specific Issues:

For text fields not being recognized properly, ensure that all variations of these fields are included in the training data.
- For draw regions, ensure that the training data includes sufficient examples of text within these regions.
- For checkbox recognition inconsistencies, include diverse examples of how checkboxes are marked in your training data.

Custom Neural Models:

Consider using custom-neural models as they often perform better in not mapping random texts to fields compared to custom-template models.

Regular Testing and Iteration:

Continuously test the model with real-world data and iterate on the training process based on the results. It's crucial to validate improvements with each retraining cycle.

By following these steps, you can enhance the accuracy and reliability of the values extracted by Azure Document Intelligence.

For more information, please refer concept-accuracy-confidence.md

I hope this helps. Thank you.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Johnimmanuel Vetri 20 Reputation points

2024-08-08T05:51:14.6233333+00:00

Your answer corresponds to the labels, which can be edited. I'm asking if a value provided by the ADI(document intelligence) is incorrect. For example, a vin number is having value 2T3VRCB123456, but the DI is giving 273VBC8123456. Can we train the custom model to adapt the correct values? We are having underline below the VIN number in our document? But manually we are able to identify the difference between T and 7, so can we do training for this ?
dupammi 8,615 Reputation points Microsoft External Staff

2024-08-08T06:04:45.5566667+00:00

Hi @Johnimmanuel Vetri

To address your concern about Azure Document Intelligence misinterpreting values like VIN numbers or confusing the letters with the numbers, you can improve the model by adding more specific and correctly labeled examples to your training dataset. Ensure that your training documents clearly distinguish between similar-looking characters, such as "T" and "7" or "I" and "1," and include diverse examples that cover all variations. By refining your dataset and retraining the model with these corrections, you can enhance the accuracy of the extracted values.

I hope you understand. Thank you.

Share via

Is there any way to improve the value results of the azure document intelligence by editing the values in the custom built model? Can I enhance it ?

1 answer

Your answer