I have been using Custom Document Model of Azure Form Recognizer, but for even wrong text extracted it gives high confidence values. I wish to minimize "false positives" in its output.

Bhaskar, Alankrit K 0 Reputation points
2023-06-19T10:50:25.5266667+00:00

I have been using Custom Document Model of Azure Form Recognizer, but for even wrong text extracted it gives high confidence values. I wish to minimize "false positives" in its output. i.e. wrong field value but high confidence levels. Is there some way to achieve it or some customization allowed?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,334 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,602 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 53,966 Reputation points Moderator
    2023-06-20T05:45:23.5066667+00:00

    Hello @Bhaskar, Alankrit K

    Thanks for reaching out to us, could you please share a sample document and result so that we can look into it?

    Generally, to minimize false positives in the output of your custom document model in Azure Form Recognizer, you can try the following approaches:

    Improve the quality of your training data: The accuracy of your model depends on the quality of your training data. Make sure that your training data is representative of the documents you want to extract data from, and that it includes examples of common errors and variations in the data.

    Adjust the confidence threshold: You can adjust the confidence threshold for your model to control the number of false positives. By setting a higher threshold, you can reduce the number of false positives, but you may also miss some valid data. You can experiment with different thresholds to find the right balance for your use case.

    Use regular expressions: You can use regular expressions to define patterns for the data you want to extract. This can help to reduce false positives by ensuring that the extracted data matches a specific pattern.

    Use post-processing: You can use post-processing techniques to further refine the output of your model. For example, you can use rules to validate the extracted data and remove any values that do not meet certain criteria.

    It is important to note that achieving high accuracy in document extraction can be challenging, especially for complex documents with varying layouts and formats. It may take some trial and error to find the right approach for your specific use case.

    I hope this helps!

    Regards,

    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.