Hi jake furrier,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!
I understand that you're encountering issue with the Azure AI Document Intelligence Auto Labeler.
During the labeling process, the platform might not allow you to manually modify incorrect characters or text detected by the OCR engine. Even if you download the labels.json file, make corrections manually, and re-upload it, the changes might not be reflected in the system.
The quality of the labeled dataset significantly affects the accuracy of the trained model. Ensure that your dataset is diverse and well-labeled. Custom models require a labeled dataset of at least five documents to train a model effectively.
If the OCR process is not detecting certain characters or fields correctly, it might be due to the quality of the documents or the OCR engine's limitations. You can try increasing the font size or improving the document quality before uploading.
Current API versions might not support the modification of the OCR output. Updating the extracted OCR and labels JSON files is not recommended as it may corrupt the project or model training.
To resolve these issues, you can try the following steps:
- Ensure that your dataset is well-labeled and diverse.
- Improve the quality of the documents before uploading them.
- Use the Azure AI Studio to compare the raw image and OCR image by opening two instances of the studio in the browser.
- Avoid manually modifying the OCR and labels JSON files.
Hope this helps. Do let us know if you have any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.