@Aravind Thanks for the question. The confidence values actually refer to two levels.
- Read Operation score
- Key/Value extraction score
Having a high score ensures the document is read correctly and the values are extracted as required. More details of these scores from our documentation are below:
Examine the "confidence" values for each key/value result under the "pageResults" node. You should also look at the confidence scores in the "readResults" node, which correspond to the text read operation. The confidence of the read results does not affect the confidence of the key/value extraction results, so you should check both.
- If the confidence scores for the read operation are low, try to improve the quality of your input documents (see Input requirements).
- If the confidence scores for the key/value extraction operation are low, ensure that the documents being analyzed are of the same type as documents used in the training set. If the documents in the training set have variations in appearance, consider splitting them into different folders and training one model for each variation.
The confidence scores you target will depend on your use case, but generally it's a good practice to target a score of 80% or above. For more sensitive cases, like reading medical records or billing statements, a score of 100% is recommended.
Hope this helps to understand the scores you are seeing in your results. Also, the formulae for inferencing these scores are not published for public as these are constantly updated based on updates to the service.