Analysis Model Effectiveness
You can test the effectiveness of an analysis model by using it to predict new data. The Predictor resource provides "scores" that measure how accurately a model predicts new data. You can use these scores to measure the effectiveness of analysis models as you adjust the values used for the Sample Size and Input and Output Attribute Fraction parameters. You use the parameter Measured Accuracy Sample Fraction to set the number of cases used for testing.
The Predictor resource includes two scores. One is called the Recommendation Score. This score is designed to measure the quality of your analysis model when used to recommend products for cross-selling. It measures the quality of a ranked list of recommendations returned from the model. The second is called the Data Fit Score. This score is designed to measure the quality of predictions made by an analysis model when filling in missing data, such as missing user properties. You can view these scores in Commerce Server Manager under the Models node. The scores are stored in your Commerce Server Data Warehouse in the PredictorModels table under the columns DataFitScore and Recommendation Score.
The Recommendation Score works as follows. Each case in the new (test) data is fed to the analysis model one at a time, with one of the transactions (chosen at random) removed. The model then returns a list of recommendations (at most the number of recommendations in the Measured Accuracy Maximum Predictions parameter), and the model is considered to be successful if the transaction that was removed is on the list. The score is the percentage of cases in which the analysis model was successful. The parameter Measured Accuracy Maximum Predictions is set at model build time and should correspond to the number of recommendations you plan to show to the user.
The Data Fit Score is the geometric average of the probability that the analysis model assigns to an observation in the test data, divided by the probability that a marginal model assigns to the observation. The marginal analysis model is a "straw-horse" model that assumes all attributes are independent of each other.
Both scores evaluate only the ability of the model to make predictions on the cases in the test data. Consequently, in order for either of the scores to be a significant indicator of the actual performance of an analysis model, the test data needs to approximate the distribution of cases that are likely to be seen in practice. Furthermore, it is important that the test data not overlap with the data used to build the analysis model. If the scores vary significantly depending on the particular test data sampled for scoring (for example, by changing the Measured Accuracy Sample Fraction option slightly, you observe a very different score), then your test data is likely to be inadequate. You may also see a high degree of variability if you have an insufficient number of input or "training" cases.
Negative Data Fit Scores for Segment models
The segmentation process groups together cases into segments such that cases in a given segment are more similar than cases in other segments. The segment description summarizes the cases in a segment. The description is probabilistic. For example, suppose that each case represents a user and the Commerce Server Data Warehouse contains the salary of the user and whether the user purchased products X, Y, and Z. The segment description for this example is determined by measuring the following quantities over the users (cases) in a given segment: the average salary (and standard deviation of salary), the probability that product X was purchased, the probability that product Y was purchased, and the probability that product Z was purchased. The Segment model consists of a segment description for each segment.
The Segment model and segment descriptions are computed using only input or "training" cases. Because of this, if there are "test " cases that contain information that was not present in the training cases, small values for the Recommendation Score or negative values for the Data Fit Score may be observed. Ideally, the training cases and test cases will contain similar information. If small values for the Recommendation Score or negative values for the Data Fit Score are observed, increasing the value of the Sample Size parameter or decreasing the value of the Measured Accuracy Sample Fraction parameter may alleviate the problem. Be aware that increasing the Sample Size parameter may result in longer time needed to build the Segment model. Use caution in decreasing the Measured Accuracy Sample Fraction parameter because this results in a smaller number of test cases. In general, it is better to have many test cases.
The Recommendation Score and Data Fit Score evaluate the Segment model only in its ability to make predictions. A Segment model that has a small value for Recommendation Score or a negative value for Data Fit Score indicates that the model may not make accurate predictions only on the test cases. You may likely find the segments produced by the Segment model to be useful for other purposes and you may evaluate them using the Segment Viewer in Commerce Server Business Desk.
See Also
Viewing Analysis Model Configuration Tables