# Microsoft Neural Network Algorithm (SSAS)

In Microsoft SQL Server 2005 Analysis Services (SSAS), the Microsoft Neural Network algorithm creates classification and regression mining models by constructing a Multilayer Perceptron network of neurons. Similar to the Microsoft Decision Trees algorithm, the Microsoft Neural Network algorithm calculates probabilities for each possible state of the input attribute when given each state of the predictable attribute. You can later use these probabilities to predict an outcome of the predicted attribute, based on the input attributes.

## Example

The Microsoft Neural Network algorithm is useful for analyzing complex input data, such as from a manufacturing or commercial process, or business problems for which a significant quantity of training data is available but for which rules cannot be easily derived by using other algorithms.

Suggested scenarios for using the Microsoft Neural Network algorithm include the following:

• Marketing and promotion analysis, such as measuring the success of a direct mail promotion or a radio advertising campaign.
• Predicting stock movement, currency fluctuation, or other highly fluid financial information from historical data.
• Analyzing manufacturing and industrial processes.

## How the Algorithm Works

The Microsoft Neural Network algorithm uses a Multilayer Perceptron network, also called a Back-Propagated Delta Rule network, composed of up to three layers of neurons, or perceptrons. These layers are an input layer, an optional hidden layer, and an output layer. In a Multilayer Perceptron network, each neuron receives one or more inputs and produces one or more identical outputs. Each output is a simple non-linear function of the sum of the inputs to the neuron. Inputs only pass forward from nodes in the input layer to nodes in the hidden layer, and then finally they pass to the output layer; there are no connections between neurons within a layer. (Inputs pass forward from nodes in the input layer to nodes in the output later if no hidden layer is included.) A detailed discussion of Multilayer Perceptron neural networks is outside the scope of this documentation.

A mining model that is constructed with the Microsoft Neural Network algorithm can contain multiple networks, depending on the number of columns that are used for both input and prediction, or that are used only for prediction. The number of networks that a single mining model contains depends on the number of states that are contained by the input columns and predictable columns that the mining model uses.

There are three types of neurons in a neural network that is created with the Microsoft Neural Network algorithm:

• Input neurons
Input neurons provide input attribute values for the data mining model. For discrete input attributes, an input neuron typically represents a single state from the input attribute, including missing values. For example, a binary input attribute produces one input node that describes a missing or existing state, indicating whether a value exists for that attribute. A Boolean column that is used as an input attribute generates three input neurons: one neuron for a true value, one neuron for a false value, and one neuron for a missing or existing state. A discrete input attribute that has more than two states generates one input neuron for each state, and one input neuron for a missing or existing state. A continuous input attribute generates two input neurons: one neuron for a missing or existing state, and one neuron for the value of the continuous attribute itself. Input neurons provide inputs to one or more hidden neurons.
• Hidden neurons
Hidden neurons receive inputs from input neurons and provide outputs to output neurons.
• Output neurons
Output neurons represent predictable attribute values for the data mining model. For discrete input attributes, an output neuron typically represents a single predicted state for a predictable attribute, including missing values. For example, a binary predictable attribute produces one output node that describes a missing or existing state, to indicate whether a value exists for that attribute. A Boolean column that is used as a predictable attribute generates three output neurons: one neuron for a true value, one neuron for a false value, and one neuron for a missing or existing state. A discrete predictable attribute that has more than two states generates one output neuron for each state, and one output neuron for a missing or existing state. Continuous predictable columns generate two output neurons: one neuron for a missing or existing state, and one neuron for the value of the continuous column itself. If more than 500 output neurons are generated by reviewing the set of predictable columns, Analysis Services generates a new network in the mining model to represent the additional output neurons.

A neuron receives several inputs: with input neurons, a neuron receives inputs from the original data; with hidden neurons and output neurons, a neuron receives inputs from the output of other neurons in the neural network. Inputs establish relationships between neurons, and the relationships serve as a path of analysis for a specific set of cases.

Each input has a value assigned to it, called the weight, which describes the relevance or importance of a particular input to the hidden neuron or the output neuron. The greater the weight that is assigned to an input, the more relevant or important the value of that input is to the neuron that receives when the algorithm determines whether that input successfully classifies a specific case. Note also that weights can be negative, which implies that the input can inhibit, rather than activate, a specific neuron. The value of the input is multiplied by the weight to emphasize for the input for a specific neuron. (For negative weights, the value of the input is multiplied by the weight deemphasize.)

Correspondingly, each neuron has a simple non-linear function assigned to it, called the activation function, which describes the relevance or importance of a particular neuron to the layer of a neural network. Hidden neurons use a hypertangent function for their activation function, whereas output neurons use a sigmoid function for their activation function. Both functions are nonlinear, continuous functions that allow the neural network to model nonlinear relationships between input and output neurons.

### Training Neural Networks

Several steps are involved in training a data mining model that uses the Microsoft Neural Network algorithm. These steps are heavily influenced by the values that you specify for the parameters that are available to the algorithm.

The algorithm first evaluates and extracts training data from the data source. A percentage of the training data, called the holdout data, is reserved for use in measuring the accuracy of the structure of the resulting model. During the training process, the model is evaluated against the holdout data after each iteration over the training data. When the accuracy of the model no longer increases, the training process is stopped. The values of the SAMPLE_SIZE and HOLDOUT_PERCENTAGE parameters are used to determine the number of cases to sample from the training data and the number of cases to be put aside for the holdout data. The value of the HOLDOUT_SEED parameter is used to randomly determine the individual cases to be put aside for the holdout data.

The algorithm next determines the number and complexity of the networks that the mining model supports. If the mining model contains one or more attributes that are used only for prediction, the algorithm creates a single network that represents all such attributes. If the mining model contains one or more attributes that are used for both input and prediction, the algorithm provider constructs a network for each such attribute. If the number of input or predictable attributes is greater than the value of the MAXIMUM_INPUT_ATTRIBUTES parameter or the MAXIMUM_OUTPUT_ATTRIBUTES parameter, respectively, a feature selection algorithm is used to reduce the complexity of the networks that are included in the mining model. Feature selection reduces the number of input or predictable attributes to those that are most statistically relevant to the model.

For input and predictable attributes that have discrete values, each input or output neuron respectively represents a single state. For input and predictable attributes that have continuous attributes, each input or output neuron respectively represents the range and distribution of values for the attribute. The maximum number of states that is supported in either case depends on the value of the MAXIMUM_STATES algorithm parameter. If the number of states for a specific attribute exceeds the value of the MAXIMUM_STATES algorithm parameter, the most popular or relevant states for that attribute are chosen, up to the maximum, and the remaining states are grouped as missing values for the purposes of analysis.

The algorithm then uses the value of the HIDDEN_NODE_RATIO parameter when determining the initial number of neurons to create for the hidden layer. You can set HIDDEN_NODE_RATIO to 0 to prevent the creation of a hidden layer in the networks that the algorithm generates for the mining model, to treat the neural network as a logistic regression.

The algorithm provider iteratively evaluates the weight for all inputs across the network at the same time, by taking the set of training data that was reserved earlier and comparing the actual known value for each case in the holdout data with the network's prediction, in a process known as batch learning. After the algorithm has evaluated the entire set of training data, the algorithm reviews the predicted and actual value for each neuron. The algorithm calculates the degree of error, if any, and adjusts the weights that are associated with the inputs for that neuron, working backward from output neurons to input neurons in a process known as backpropagation. The algorithm then repeats the process over the entire set of training data. Because the algorithm can support many weights and output neurons, the conjugate gradient algorithm is used to guide the training process for assigning and evaluating weights for inputs. A discussion of the conjugate gradient algorithm is outside the scope of this documentation.

## Using the Algorithm

A neural network model must contain a key column, one or more input columns, and one or more predictable columns.

The Microsoft Neural Network algorithm supports specific input column content types, predictable column content types, and modeling flags, which are listed in the following table.

 Input column content types Continuous, Cyclical, Discrete, Discretized, Key, Table, and Ordered Predictable column content types Continuous, Cyclical, Discrete, Discretized, and Ordered Modeling flags MODEL_EXISTENCE_ONLY and NOT NULL Distribution flags Normal, Uniform, and Log Normal

All Microsoft algorithms support a common set of functions. However, the Microsoft Neural Network algorithm supports additional functions, listed in the following table.

For a list of the functions that are common to all Microsoft algorithms, see Data Mining Algorithms. For more information about how to use these functions, see Data Mining Extensions (DMX) Function Reference.

Models that are created by using the Microsoft Neural Network algorithm do not support drillthrough or data mining dimensions, because the structure of nodes in the mining model does not necessarily correspond directly to the underlying data.

The Microsoft Neural Network algorithm supports several parameters that affect the performance and accuracy of the resulting mining model. The following table describes each parameter.

Parameter Description

HIDDEN_NODE_RATIO

Specifies the ratio of hidden neurons to input and output neurons. The following formula determines the initial number of neurons in the hidden layer:

HIDDEN_NODE_RATIO * SQRT(Total input neurons * Total output neurons)

The default value is 4.0.

HOLDOUT_PERCENTAGE

Specifies the percentage of cases within the training data used to calculate the holdout error, which is used as part of the stopping criteria while training the mining model.

The default value is 30.

HOLDOUT_SEED

Specifies a number that is used to seed the pseudo-random generator when the algorithm randomly determines the holdout data. If this parameter is set to 0, the algorithm generates the seed based on the name of the mining model, to guarantee that the model content remains the same during reprocessing.

The default value is 0.

MAXIMUM_INPUT_ATTRIBUTES

Determines the maximum number of input attributes that can be supplied to the algorithm before feature selection is employed. Setting this value to 0 disables feature selection for input attributes.

The default value is 255.

MAXIMUM_OUTPUT_ATTRIBUTES

Determines the maximum number of output attributes that can be supplied to the algorithm before feature selection is employed. Setting this value to 0 disables feature selection for output attributes.

The default value is 255.

MAXIMUM_STATES

Specifies the maximum number of discrete states per attribute that is supported by the algorithm. If the number of states for a specific attribute is greater than the number that is specified for this parameter, the algorithm uses the most popular states for that attribute and treats the remaining states as missing.

The default value is 100.

SAMPLE_SIZE

Specifies the number of cases to be used to train the model. The algorithm uses either this number or the percentage of total of cases not included in the holdout data as specified by the HOLDOUT_PERCENTAGE parameter, whichever value is smaller.

In other words, if HOLDOUT_PERCENTAGE is set to 30, the algorithm will use either the value of this parameter, or a value equal to 70 percent of the total number of cases, whichever is smaller.

The default value is 10000.