Confusion Matrix Accuracy in Conversation Understanding Service

Zhang, Jayden 20 Reputation points
2023-11-11T08:37:15.5966667+00:00

Hi, I have 3 questions and hope to get help. I am using Conversation Understanding service in Language Studio

  1. what is the meaning of matrix view option in confusion matrix part? Especially, when I select error values, in the confusion matrix, how to understand the diagonal value meaning which are supposed to be true positive?
  2. If I select all values in confusion matrix, the diagonal values are much larger than real true positive values. Like one diagonal value shows 64 but when I clicked into the value, there are only 6 true positive values. It's quite confusing
  3. why there can be float values in the confusion matrix?
Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
287 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 8,631 Reputation points
    2023-11-12T17:52:20.0066667+00:00

    Q1: The confusion matrix typically shows the performance of a classification model. Each row represents the instances of an actual class, while each column represents the instances of a predicted class. (check the definition here):

    It is a matrix of numbers that tell us where a model gets confused. It is a class-wise distribution of the predictive performance of a classification model—that is, the confusion matrix is an organized way of mapping the predictions to the original classes to which the data belong.

    The diagonal values in a confusion matrix generally represent true positives for each class. This means that the model correctly predicted the class. For instance, if a diagonal value for a specific class is 50, it indicates that the model correctly identified 50 instances of that class.

    User's image

    Confusion Matrix for a binary class dataset

    Q2: What you're observing, where the diagonal values are larger than the actual true positives, could be due to several factors. One possibility is that the tool you are using might be including other types of correct predictions (like true negatives for other classes) in this count. It might also be a result of how the tool aggregates or displays data. When you click into a value and see fewer true positives, it's showing a more detailed or filtered view. This might exclude certain types of correct predictions that are included in the broader diagonal count.

    Q3 : Typically, confusion matrices contain integer values, as they count the number of instances. However, there could be cases where float values appear. This might happen if the matrix is normalized (dividing each value by the total number of instances) to provide proportional data instead of absolute counts. Another possibility is that the model outputs probabilities or confidence levels instead of definite class predictions, which are then translated into float values in the matrix.

    You can also learn more : https://learn.microsoft.com/en-us/training/modules/machine-learning-confusion-matrix/

    0 comments No comments