Analysis Models
An analysis model is a set of statistical relationships based on observations of past site users and/or their purchase history, click history, or other behavior. The model contains information about the types of users who visit your site, but it does not contain information about specific users. The detailed information used to create an analysis model is stored in the Commerce Server Data Warehouse.
All models are stored as binary data in the Data column of the PredictorModels table in the Data Warehouse database.
A site can run several different analysis models, and each model can be based on data from a different database. Sites can share analysis models because the models are not site-specific. Commerce Server 2000 does not limit the number of models you can use for each site or server.
Prediction Models and Segment Models
Cases
Dense Table
Sparse Table
Prediction Models and Segment Models
You can build two types of analysis models: Prediction models and Segment models. You view Prediction models with the Prediction Model Viewer in Commerce Server Manager and Segment models in the Segment Viewer module in Commerce Server Business Desk.
A Prediction model consists of a set of decision trees and a summary network called a dependency network. A dependency network summarizes the dependencies (predictive relationships) described by the decision trees. Prediction models provide more accurate predictions than do segmentation or memory-based algorithms. You use the Prediction Model Viewer to view both the dependency network and the individual decision trees.
A Segment model partitions groups of users into segments based on similar user properties. Similar segments are grouped into collections of segments, creating a segment hierarchy. You use the Segment Viewer module in Business Desk to view Segment models.
Before you build an analysis model, you must ensure that the data you want to build the model with is available in the Data Warehouse. In addition, if you do not want to use the default Transactions model configuration, you will need to create your own configuration.
You construct analysis models from cases, which you construct from the data in the Data Warehouse. Data must be imported into the Data Warehouse before you can use the data to build cases. For information about importing data into the Data Warehouse, see Importing Data into the Data Warehouse.
Cases
A case is the set of all information known about a specific site user and consists of attribute/value pairs. This information may include user profile data, purchase history, click history, or a combination of these. A typical analysis model uses between 10,000 and 50,000 cases with a resulting size of between 10 kilobytes (KB) and 100 KB. The number of cases in a given model configuration affects the accuracy of the analysis models using that configuration. In general, predictions are more accurate when many cases are available for a model. However, after a certain point (often 50,000 cases or less), there is little increase in accuracy when additional cases are used. In contrast, the time necessary to build an analysis model increases with the number of cases in the configuration.
You build cases from the data stored in the Data Warehouse database. Only certain types of data tables in the Data Warehouse database will provide useful information for your Web business. For example, to predict which products specific users are likely to purchase, you will likely want to use data tables or views (virtual tables that represent the data in one or more tables in an alternative way) containing information about your users, such as user profiles, purchase history, and/or click history.
The two most common types of data used to build a model configuration are user profile data, which is stored in dense tables, and transaction history logs, which are stored in sparse tables. You can use the Transactions model configuration provided by default to analyze transactions in a sparse table, or you can create your own model configuration. For example, you can create a configuration that analyzes a single dense table with demographic information, or you can create a configuration that analyzes both a dense table and one or more sparse tables containing transactions.
Note
- If both one dense and one or more sparse tables are used, only data corresponding to users listed in the dense table will be processed. If you want to use sparse data for users without corresponding information in the dense table, you must create records in the dense table for these users with all unknown values set to NULL. For information about the procedure for setting up your own configuration, see Predictor Schema.
Dense Table
In a dense table (or view), the attributes of a case are represented as column values in a single row. Demographic customer data (possibly consisting of Age, Gender, and Salary) typically is represented best in a dense table. In this type of dense table, there is exactly one row for each user and exactly one column for each attribute.
A common source of data for a dense table is profile information submitted by users when they complete a registration form. For information about creating profile definitions to collect user data on your Web site, see Running the Profiles Resource.
The following figure shows the layout of a typical dense table.
Sparse Table
In a sparse table (or view), a case is represented as multiple rows, with each row representing an attribute/value pair. For example, product purchase data may have the schema: (CustomerID, ProductID, Quantity). In this type of sparse table, there is one row for each product purchased by the user. In general, a sparse table contains one row for each transaction, such as one row for each item ordered or each page visited. This table may have several rows for each user who visits your site, because one user may have ordered several items or visited several pages. All of the rows associated with a single user comprise one case. The case identifier is typically some user ID, such as a unique user or login name. The property is the item you want to predict, and the value is the quantity of that property.
Because the data needed for a sparse data table may be stored in two or more tables in your data source, you may need to use SQL Server Enterprise Manager to create a view that contains the columns required by your analysis model configuration. For example, to predict a purchase that is based on other products purchased, you need the user ID, product SKU, and quantity purchased columns. These columns may be located in two or three different linked tables (such as Customers, Orders, and Order Details, or Requisition and POLineItems) depending on the structure of your database.
The following figure shows the layout of a typical sparse table.
For information about building dense and sparse tables and creating analysis model configurations, see Predictor Schema.