Azure AI Document Intelligence Auto Labeler and Mark for Upload to Training Set

Question

Azure AI Document Intelligence Auto Labeler and Mark for Upload to Training Set

jake furrier 0

In Azure Document Intelligence Studio, I am experiencing an issue with the Autolabeler. Specifically, in the Label Datasection, it does not extract fields. When I test the model, it successfully returns the correct labels. However, when I upload documents to the training set, they are uploaded without the labeled data. Could you help me understand why this is happening and how to resolve it?

1 answer

Your answer

Answer 1

Hi jake furrier,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

I understand that you're encountering issue with the Azure AI Document Intelligence Auto Labeler.

During the labeling process, the platform might not allow you to manually modify incorrect characters or text detected by the OCR engine. Even if you download the labels.json file, make corrections manually, and re-upload it, the changes might not be reflected in the system.

The quality of the labeled dataset significantly affects the accuracy of the trained model. Ensure that your dataset is diverse and well-labeled. Custom models require a labeled dataset of at least five documents to train a model effectively.

If the OCR process is not detecting certain characters or fields correctly, it might be due to the quality of the documents or the OCR engine's limitations. You can try increasing the font size or improving the document quality before uploading.

Current API versions might not support the modification of the OCR output. Updating the extracted OCR and labels JSON files is not recommended as it may corrupt the project or model training.

To resolve these issues, you can try the following steps:

Ensure that your dataset is well-labeled and diverse.
Improve the quality of the documents before uploading them.
Use the Azure AI Studio to compare the raw image and OCR image by opening two instances of the studio in the browser.
Avoid manually modifying the OCR and labels JSON files.

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator

2025-01-20T17:41:18.92+00:00

Hi jake furrier,
Hope you are having a great day.

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.

Looking forward to your response and appreciate your time on this.

If you feel that your quires have been resolved, please accept the answer by clicking the "Upvote" and "Accept Answer" on the post.
Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator

2025-01-21T16:35:58.7166667+00:00

Hi jake furrier,
Hope you are having a great day.

Just checking in to see if you have got a chance to see my response to your question in resolving the issue.

Share via

Azure AI Document Intelligence Auto Labeler and Mark for Upload to Training Set

1 answer

Your answer