Hi Marques Chacon,
Greetings & Welcome to the Microsoft Q&A forum! Thank you for sharing your query.
I understand that you are facing with Issue with accurately Labeling Data for Custom Extraction Model.
For your first scenario regarding OCR inaccuracies, you can edit the labels in the Form Recognizer Studio. The model trains on the labeled data you provide, which means you can adjust the labels to match the correct values, even if the OCR output is incorrect. This allows you to ensure that the training data reflects the accurate values you want to extract.
In your second scenario, where the invoice number is part of a larger token (like "456123(10/29/2024)"), you would indeed use region labeling to select just the "456123" part. Region labeling allows you to specify the exact area of the document you want to label, which can be useful for extracting specific tokens from a larger string. kindly refer this document https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/v21/label-tool?view=doc-intel-2.1.0
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.