Improving Accuracy and Handling Duplicate Data Extraction with Azure Form Recognizer

Alejandro Roman 0

Hello,

I've been using Azure's Form Recognizer for one of my projects, and while it offers great utility, I've encountered a few challenges:

Duplicate Extractions: The OCR sometimes extracts the same information twice. Is there a way to refine its accuracy in this regard?
Parsing Issues: There are instances where the OCR doesn't parse the extracted data correctly, leading to inaccuracies in the results.

I would greatly appreciate any suggestions or best practices to improve the accuracy and reduce these issues.

Additionally, is there a mechanism or feature within Azure Form Recognizer where I can provide feedback on the extraction results? I believe that a feedback loop could be beneficial in improving the model's accuracy over time, especially for the specific forms I'm working with.

So far my training data set consists of 17 documents. These are very structured tax documents.

Thank you in advance for your assistance and recommendations!

VasaviLankipalle-MSFT 17,641 Reputation points

2023-09-29T23:29:04.9+00:00

Hello @Alejandro Roman , Thanks for using Microsoft Q&A Platform.

May I know the custom model and the API version you are using.
Alejandro Roman 0 Reputation points

2023-09-30T01:56:28.75+00:00

Hi @VasaviLankipalle-MSFT !

Thanks for the response. The API version I'm using is 2023-07-31

How do I find out the custom model? It is a custom extraction model
Alejandro Roman 0 Reputation points

2023-10-03T00:02:20.77+00:00

Hi @VasaviLankipalle-MSFT , following up here. Any guidance?
VasaviLankipalle-MSFT 17,641 Reputation points

2023-10-03T03:49:16.9633333+00:00

Hello @Alejandro Roman , this a known issue. Looks like the duplicate tokens issue has been fixed recently, and this should work with the latest 2022-08-31 GA early from this month. Could you please verify and let us know if you still face issues?

Also, sometimes inconsistent labeling also causes issues like parsing. Try to label documents correctly and see if that helps. https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-custom-label?view=doc-intel-3.1.0#labeling-guidelines
Alejandro Roman 0 Reputation points

2023-10-03T17:17:50.44+00:00

Thank you for the prompt response @VasaviLankipalle-MSFT.

How do I update to the latest GA?

Also, thank you for the resource on consistent labeling
VasaviLankipalle-MSFT 17,641 Reputation points

2023-10-05T05:11:24.1666667+00:00

@Alejandro Roman , this should work with the latest 2022-08-31 GA starting from this month (Oct).

Share via

Improving Accuracy and Handling Duplicate Data Extraction with Azure Form Recognizer

Your answer