Data Mining Algorithms (Analysis Services - Data Mining)
The data mining algorithm is the mechanism that creates a data mining model. To create a model, an algorithm first analyzes a set of data and looks for specific patterns and trends. The algorithm uses the results of this analysis to define the parameters of the mining model. These parameters are then applied across the entire data set to extract actionable patterns and detailed statistics.
The mining model that an algorithm creates can take various forms, including:
A set of rules that describe how products are grouped together in a transaction.
A decision tree that predicts whether a particular customer will buy a product.
A mathematical model that forecasts sales.
A set of clusters that describe how the cases in a dataset are related.
Microsoft SQL Server Analysis Services provides several algorithms for use in your data mining solutions. These algorithms are a subset of all the algorithms that can be used for data mining. You can also use third-party algorithms that comply with the OLE DB for Data Mining specification. For more information about third-party algorithms, see Plugin Algorithms.
Types of Data Mining Algorithms
Analysis Services includes the following algorithm types:
Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset. An example of a classification algorithm is the Microsoft Decision Trees Algorithm.
Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset. An example of a regression algorithm is the Microsoft Time Series Algorithm.
Segmentation algorithms divide data into groups, or clusters, of items that have similar properties. An example of a segmentation algorithm is the Microsoft Clustering Algorithm.
Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis. An example of an association algorithm is the Microsoft Association Algorithm.
Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow. An example of a sequence analysis algorithm is the Microsoft Sequence Clustering Algorithm.
Applying the Algorithms
Choosing the best algorithm to use for a specific business task can be a challenge. While you can use different algorithms to perform the same business task, each algorithm produces a different result, and some algorithms can produce more than one type of result. For example, you can use the Microsoft Decision Trees algorithm not only for prediction, but also as a way to reduce the number of columns in a dataset, because the decision tree can identify columns that do not affect the final mining model.
You also do not have to use algorithms independently. In a single data mining solution you can use some algorithms to explore data, and then use other algorithms to predict a specific outcome based on that data. For example, you can use a clustering algorithm, which recognizes patterns, to break data into groups that are more or less homogeneous, and then use the results to create a better decision tree model. You can use multiple algorithms within one solution to perform separate tasks, for example by using a regression tree algorithm to obtain financial forecasting information, and a rule-based algorithm to perform a market basket analysis.
Mining models can predict values, produce summaries of data, and find hidden correlations. To help you select algorithms for your data mining solution, the following table provides suggestions for which algorithms to use for specific tasks.
Task |
Microsoft algorithms to use |
---|---|
Predicting a discrete attribute. For example, predict whether the recipient of a targeted mailing campaign will buy a product. |
Microsoft Decision Trees Algorithm Microsoft Naive Bayes Algorithm |
Predicting a continuous attribute. For example, forecast next year's sales. |
|
Predicting a sequence. For example, perform a clickstream analysis of a company's Web site. |
|
Finding groups of common items in transactions. For example, use market basket analysis to suggest additional products to a customer for purchase. |
|
Finding groups of similar items. For example, segment demographic data into groups to better understand the relationships between attributes. |
Because each model returns a different type of result, Analysis Services provides a separate viewer for each algorithm. When you browse a mining model in Analysis Services, the model is displayed on the Mining Model Viewer tab of Data Mining Designer, which uses the appropriate viewer for the model. For more information, see Viewing a Data Mining Model.
Algorithm Details
The following table provides links to the types of information available for each algorithm:
Basic algorithm description Provides a basic explanation of what the algorithm does and how it works, together with a business scenario where the algorithm might be useful.
Technical reference Lists the parameters that you can set to control the behavior of the algorithm and customize the results in the model. Provides additional technical detail about the implementation of the algorithm, performance tips, and data requirements.
Querying a model Provides examples of queries that you can use with each model type.You can query a model to learn more about the patterns in the model, or to make predictions based on those patterns.
Mining model content Describes how information is stored in a common structure for all model types, and explains how to interpret the information. After you have built a model, you can explore the model by using the viewers provided in BI Development Studio, or you can write queries to return information directly from the model content by using DMX.