Mining Model Content for Logistic Regression Models

Cikk
10/31/2023

Applies to: SQL Server 2019 and earlier Analysis Services Azure Analysis Services Fabric/Power BI Premium

Important

Data mining was deprecated in SQL Server 2017 Analysis Services and now discontinued in SQL Server 2022 Analysis Services. Documentation is not updated for deprecated and discontinued features. To learn more, see Analysis Services backward compatibility.

This topic describes mining model content that is specific to models that use the Microsoft Logistic Regression algorithm. For an explanation of how to interpret statistics and structure shared by all model types, and general definitions of terms related to mining model content, see Mining Model Content (Analysis Services - Data Mining).

Understanding the Structure of a Logistic Regression Model

A logistic regression model is created by using the Microsoft Neural Network algorithm with parameters that constrain the model to eliminate the hidden node. Therefore, the overall structure of a logistic regression model is almost identical to that of a neural network: each model has a single parent node that represents the model and its metadata, and a special marginal statistics node (NODE_TYPE = 24) that provides descriptive statistics about the inputs used in the model.

Additionally, the model contains a subnetwork (NODE_TYPE = 17) for each predictable attribute. Just like in a neural network model, each subnetwork always contains two branches: one for the input layer, and another branch that contains the hidden layer (NODE_TYPE = 19) and the output layer (NODE_TYPE = 20) for the network. The same subnetwork may be used for multiple attributes if they are specified as predict-only. Predictable attributes that are also inputs may not appear in the same subnetwork.

However, in a logistic regression model, the node that represents the hidden layer is empty, and has no children. Therefore the model contains nodes that represent individual outputs (NODE_TYPE = 23) and individual inputs (NODE_TYPE = 21) but no individual hidden nodes.

structure of content for logisitc regression model

By default, a logistic regression model is displayed in the Microsoft Neural Network Viewer. With this custom viewer, you can filter on input attributes and their values, and graphically see how they affect the outputs. The tooltips in the viewer show you the probability and lift associated with each pair of inputs and output values. For more information, see Browse a Model Using the Microsoft Neural Network Viewer.

To explore the structure of the inputs and subnetworks, and to see detailed statistics, you can use the Microsoft Generic Content Tree viewer. You can click on any node to expand it and see the child nodes, or view the weights and other statistics contained in the node.

Model Content for a Logistic Regression Model

This section provides detail and examples only for those columns in the mining model content that have particular relevance for logistic regression. The model content is almost identical to that of a neural network model, but descriptions that apply to neural network models may be repeated in this table for convenience.

For information about general-purpose columns in the schema rowset, such as MODEL_CATALOG and MODEL_NAME, that are not described here, or for explanations of mining model terminology, see Mining Model Content (Analysis Services - Data Mining).

MODEL_CATALOG
Name of the database where the model is stored.

MODEL_NAME
Name of the model.

ATTRIBUTE_NAME
The names of the attribute that corresponds to this node.

Node	Content
Model root	Blank
Marginal statistics	Blank
Input layer	Blank
Input node	Input attribute name
Hidden layer	Blank
Output layer	Blank
Output node	Output attribute name

NODE_NAME
The name of the node. Currently, this column contains the same value as NODE_UNIQUE_NAME, though this may change in future releases.

NODE_UNIQUE_NAME
The unique name of the node.

For more information about how the names and IDs provide structural information about the model, see the section, Using Node Names and IDs.

NODE_TYPE
A logistic regression model outputs the following node types:

Node Type ID	Description
1	Model.
17	Organizer node for the subnetwork.
18	Organizer node for the input layer.
19	Organizer node for the hidden layer. The hidden layer is empty.
20	Organizer node for the output layer.
21	Input attribute node.
23	Output attribute node.
24	Marginal statistics node.

NODE_CAPTION
A label or a caption associated with the node. In logistic regression models, always blank.

CHILDREN_CARDINALITY
An estimate of the number of children that the node has.

Node	Content
Model root	Indicates the count of child nodes, which includes at least 1 network, 1 required marginal node, and 1 required input layer. For example, if the value is 5, there are 3 subnetworks.
Marginal statistics	Always 0.
Input layer	Indicates the number of input attribute-values pairs that were used by the model.
Input node	Always 0.
Hidden layer	In a logistic regression model, always 0.
Output layer	Indicates the number of output values.
Output node	Always 0.

PARENT_UNIQUE_NAME
The unique name of the node's parent. NULL is returned for any nodes at the root level.

For more information about how the names and IDs provide structural information about the model, see the section, Using Node Names and IDs.

NODE_DESCRIPTION
A user-friendly description of the node.

Node	Content
Model root	Blank
Marginal statistics	Blank
Input layer	Blank
Input node	Input attribute name
Hidden layer	Blank
Output layer	Blank
Output node	If the output attribute is continuous, contains the name of the output attribute. If the output attribute is discrete or discretized, contains the name of the attribute and the value.

NODE_RULE
An XML description of the rule that is embedded in the node.

Node	Content
Model root	Blank
Marginal statistics	Blank
Input layer	Blank
Input node	An XML fragment containing the same information as the NODE_DESCRIPTION column.
Hidden layer	Blank
Output layer	Blank
Output node	An XML fragment containing the same information as the NODE_DESCRIPTION column.

MARGINAL_RULE
For logistic regression models, always blank.

NODE_PROBABILITY
The probability associated with this node. For logistic regression models, always 0.

MARGINAL_PROBABILITY
The probability of reaching the node from the parent node. For logistic regression models, always 0.

NODE_DISTRIBUTION
A nested table that contains statistical information for the node. For detailed information about the contents of this table for each node type, see the section, Understanding the NODE_DISTRIBUTION Table, in Mining Model Content for Neural Network Models (Analysis Services - Data Mining).

NODE_SUPPORT
For logistic regression models, always 0.

Note

Support probabilities are always 0 because the output of this model type is not probabilistic. The only thing that is meaningful for the algorithm is the weights; therefore, the algorithm does not compute probability, support, or variance.

To get information about the support in the training cases for specific values, see the marginal statistics node.

MSOLAP_MODEL_COLUMN

Node	Content
Model root	Blank
Marginal statistics	Blank
Input layer	Blank
Input node	Input attribute name.
Hidden layer	Blank
Output layer	Blank
Output node	Input attribute name.

MSOLAP_NODE_SCORE
In logistic regression models, always 0.

MSOLAP_NODE_SHORT_CAPTION
In logistic regression models, always blank.

Using Node Names and IDs

The naming of the nodes in a logistic regression model provides additional information about the relationships between nodes in the model. The following table shows the conventions for the IDs that are assigned to nodes in each layer.

Node Type	Convention for node ID
Model root (1)	00000000000000000.
Marginal statistics node (24)	10000000000000000
Input layer (18)	30000000000000000
Input node (21)	Starts at 60000000000000000
Subnetwork (17)	20000000000000000
Hidden layer (19)	40000000000000000
Output layer (20)	50000000000000000
Output node (23)	Starts at 80000000000000000

You can use these IDs to determine how output attributes are related to specific input layer attributes, by viewing the NODE_DISTRIBUTION table of the output node. Each row in that table contains an ID that points back to a specific input attribute node. The NODE_DISTRIBUTION table also contains the coefficient for that input-output pair.

Megosztás a következőn keresztül:

Mining Model Content for Logistic Regression Models

Understanding the Structure of a Logistic Regression Model

Model Content for a Logistic Regression Model

Using Node Names and IDs

See Also

Visszajelzés

További források