Prediction scores indicate prediction accuracy for intent and entities

Article
08/28/2024

Important

LUIS will be retired on October 1st 2025 and starting April 1st 2023 you will not be able to create new LUIS resources. We recommend migrating your LUIS applications to conversational language understanding to benefit from continued product support and multilingual capabilities.

A prediction score indicates the degree of confidence LUIS has for prediction results of a user utterance.

A prediction score is between zero (0) and one (1). An example of a highly confident LUIS score is 0.99. An example of a score of low confidence is 0.01.

Score value	Confidence
1	definite match
0.99	high confidence
0.01	low confidence
0	definite failure to match

Top-scoring intent

Every utterance prediction returns a top-scoring intent. This prediction is a numerical comparison of prediction scores.

Proximity of scores to each other

The top 2 scores can have a very small difference between them. LUIS doesn't indicate this proximity other than returning the top score.

Return prediction score for all intents

A test or endpoint result can include all intents. This configuration is set on the endpoint using the correct querystring name/value pair.

Prediction API	Querystring name
V3	`show-all-intents=true`
V2	`verbose=true`

Review intents with similar scores

Reviewing the score for all intents is a good way to verify that not only is the correct intent identified, but that the next identified intent's score is significantly and consistently lower for utterances.

If multiple intents have close prediction scores, based on the context of an utterance, LUIS may switch between the intents. To fix this situation, continue to add utterances to each intent with a wider variety of contextual differences or you can have the client application, such as a chat bot, make programmatic choices about how to handle the 2 top intents.

The 2 intents, which are too-closely scored, may invert due to non-deterministic training. The top score could become the second top and the second top score could become the first top score. In order to prevent this situation, add example utterances to each of the top two intents for that utterance with word choice and context that differentiates the 2 intents. The two intents should have about the same number of example utterances. A rule of thumb for separation to prevent inversion due to training, is a 15% difference in scores.

You can turn off the non-deterministic training by training with all data.

Differences with predictions between different training sessions

When you train the same model in a different app, and the scores are not the same, this difference is because there is non-deterministic training (an element of randomness). Secondly, any overlap of an utterance to more than one intent means the top intent for the same utterance can change based on training.

If your chat bot requires a specific LUIS score to indicate confidence in an intent, you should use the score difference between the top two intents. This situation provides flexibility for variations in training.

You can turn off the non-deterministic training by training with all data.

E (exponent) notation

Prediction scores can use exponent notation, appearing above the 0-1 range, such as 9.910309E-07. This score is an indication of a very small number.

E notation score	Actual score
9.910309E-07	.0000009910309

Application settings

Use application settings to control how diacritics and punctuation impact prediction scores.

Next steps

See Add entities to learn more about how to add entities to your LUIS app.

Share via