Data Mining Algorithms
The data mining algorithm is the mechanism that creates mining models. To create a model, an algorithm first analyzes a set of data, looking for specific patterns and trends. The algorithm then uses the results of this analysis to define the parameters of the mining model.
The mining model that an algorithm creates can take various forms, including:
- A set of rules that describe how products are grouped together in a transaction.
- A decision tree that predicts whether a particular customer will buy a product.
- A mathematical model that forecasts sales.
- A set of clusters that describe how the cases in a dataset are related.
Microsoft SQL Server 2005 Analysis Services (SSAS) provides several algorithms for use in your data mining solutions. These algorithms are a subset of all the algorithms that can be used for data mining. You can also use third-party algorithms that comply with the OLE DB for Data Mining specification. For more information about third-party algorithms, see Plugin Algorithms.
Reviewing the Algorithms
Analysis Services includes the following algorithm types:
- Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset. An example of a classification algorithm is the Microsoft Decision Trees Algorithm.
- Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset. An example of a regression algorithm is the Microsoft Time Series Algorithm.
- Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. An example of a segmentation algorithm is the Microsoft Clustering Algorithm.
- Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. An example of an association algorithm is the Microsoft Association Algorithm.
- Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow. An example of a sequence analysis algorithm is the Microsoft Sequence Clustering Algorithm.
Applying the Algorithms
Choosing the right algorithm to use for a specific business task can be a challenge. While you can use different algorithms to perform the same business task, each algorithm produces a different result, and some algorithms can produce more than one type of result. For example, you can use the Microsoft Decision Trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can identify columns that do not affect the final mining model.
You also do not have to use algorithms independently—in a single data mining solution you can use some algorithms to explore data, and then use other algorithms to predict a specific outcome based on that data. For example, you can use a clustering algorithm, which recognizes patterns, to break data into groups that are more or less homogeneous, and then use the results to create a better decision tree model. You can use multiple algorithms within one solution to perform separate tasks, for example by using a regression tree algorithm to obtain financial forecasting information, and a rule-based algorithm to perform a market basket analysis.
Mining models can predict values, produce summaries of data, and find hidden correlations. To help you select algorithms for your data mining solution, the following table provides suggestions for which algorithms to use for specific tasks.
|Task||Microsoft algorithms to use|
Predicting a discrete attribute. For example, to predict whether the recipient of a targeted mailing campaign will buy a product.
Predicting a continuous attribute. For example, to forecast next year's sales.
Predicting a sequence. For example, to perform a clickstream analysis of a company's Web site.
Finding groups of common items in transactions. For example, to use market basket analysis to suggest additional products to a customer for purchase.
Finding groups of similar items. For example, to segment demographic data into groups to better understand the relationships between attributes.
Because each model returns a different type of result, Analysis Services provides a separate viewer for each algorithm. When you browse a mining model in Analysis Services, the model is displayed on the Mining Model Viewer tab of Data Mining Designer, using the appropriate viewer for the model. For more information, see Viewing a Data Mining Model.
You can use functions to extend the results that a mining model returns. The following table lists the functions that are supported by all algorithms in Analysis Services.
Individual algorithms may support additional functions. None of the algorithms that Microsoft provides allow duplicate keys.
Data Mining Concepts
Mining Structures (Analysis Services)
Microsoft Association Algorithm
Microsoft Clustering Algorithm
Microsoft Decision Trees Algorithm
Microsoft Naive Bayes Algorithm
Microsoft Neural Network Algorithm (SSAS)
Microsoft Sequence Clustering Algorithm
Microsoft Time Series Algorithm
Microsoft Linear Regression Algorithm
Microsoft Logistic Regression Algorithm
Using the Data Mining Tools