Azure AI Document Intelligence Auto Labeler and Mark for Upload to Training Set

jake furrier 0 Reputation points
2025-01-17T17:09:05.9+00:00

In Azure Document Intelligence Studio, I am experiencing an issue with the Autolabeler. Specifically, in the Label Datasection, it does not extract fields. When I test the model, it successfully returns the correct labels. However, when I upload documents to the training set, they are uploaded without the labeled data. Could you help me understand why this is happening and how to resolve it?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,118 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Pavankumar Purilla 8,570 Reputation points Microsoft External Staff Moderator
    2025-01-18T02:53:54.0766667+00:00

    Hi jake furrier,
    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    I understand that you're encountering issue with the Azure AI Document Intelligence Auto Labeler.

    During the labeling process, the platform might not allow you to manually modify incorrect characters or text detected by the OCR engine. Even if you download the labels.json file, make corrections manually, and re-upload it, the changes might not be reflected in the system.

    The quality of the labeled dataset significantly affects the accuracy of the trained model. Ensure that your dataset is diverse and well-labeled. Custom models require a labeled dataset of at least five documents to train a model effectively.

    If the OCR process is not detecting certain characters or fields correctly, it might be due to the quality of the documents or the OCR engine's limitations. You can try increasing the font size or improving the document quality before uploading.

    Current API versions might not support the modification of the OCR output. Updating the extracted OCR and labels JSON files is not recommended as it may corrupt the project or model training.

    To resolve these issues, you can try the following steps:

    • Ensure that your dataset is well-labeled and diverse.
    • Improve the quality of the documents before uploading them.
    • Use the Azure AI Studio to compare the raw image and OCR image by opening two instances of the studio in the browser.
    • Avoid manually modifying the OCR and labels JSON files.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.