I have been using Custom Document Model of Azure Form Recognizer, but for even wrong text extracted it gives high confidence values. I wish to minimize "false positives" in its output.

Question

I have been using Custom Document Model of Azure Form Recognizer, but for even wrong text extracted it gives high confidence values. I wish to minimize "false positives" in its output.

Bhaskar, Alankrit K 0

I have been using Custom Document Model of Azure Form Recognizer, but for even wrong text extracted it gives high confidence values. I wish to minimize "false positives" in its output. i.e. wrong field value but high confidence levels. Is there some way to achieve it or some customization allowed?

1 answer

Your answer

Answer 1

Hello @Bhaskar, Alankrit K

Thanks for reaching out to us, could you please share a sample document and result so that we can look into it?

Generally, to minimize false positives in the output of your custom document model in Azure Form Recognizer, you can try the following approaches:

Improve the quality of your training data: The accuracy of your model depends on the quality of your training data. Make sure that your training data is representative of the documents you want to extract data from, and that it includes examples of common errors and variations in the data.

Adjust the confidence threshold: You can adjust the confidence threshold for your model to control the number of false positives. By setting a higher threshold, you can reduce the number of false positives, but you may also miss some valid data. You can experiment with different thresholds to find the right balance for your use case.

Use regular expressions: You can use regular expressions to define patterns for the data you want to extract. This can help to reduce false positives by ensuring that the extracted data matches a specific pattern.

Use post-processing: You can use post-processing techniques to further refine the output of your model. For example, you can use rules to validate the extracted data and remove any values that do not meet certain criteria.

It is important to note that achieving high accuracy in document extraction can be challenging, especially for complex documents with varying layouts and formats. It may take some trial and error to find the right approach for your specific use case.

I hope this helps!

Regards,

Yutong

-Please kindly accept the answer if you feel helpful to support the community, thanks a lot.

Bhaskar, Alankrit K 0 Reputation points

2023-06-26T05:40:04.9466667+00:00

Hi @YutongTie-MSFT

Can you please elaborate more on how to adjust threshold confidence level and use regular expression on Azure Form Recognizer. I couldn't find any customizations for the model on the portal.

Share via

I have been using Custom Document Model of Azure Form Recognizer, but for even wrong text extracted it gives high confidence values. I wish to minimize "false positives" in its output.

1 answer

Your answer