Dela via


Mining Model Content for Association Models (Analysis Services - Data Mining)

This topic describes mining model content that is specific to models that use the Microsoft Association Rules algorithm. For an explanation of general and statistical terminology related to mining model content that applies to all model types, see Mining Model Content (Analysis Services - Data Mining).

Understanding the Structure of an Association Model

An association model has a simple structure. Each model has a single parent node that represents the model and its metadata, and each parent node has a flat list of itemsets and rules. The itemsets and rules are not organized in trees, they are ordered with itemsets first and rules next as shown in the following diagram.

structure of model content for association models

Each itemset is contained in its own node (NODE_TYPE = 7). The node includes the definition of the itemset, the number of cases that contain this itemset, and other information.

Each rule is also contained in its own node (NODE_TYPE = 8). A rule describes a general pattern for how items are associated. A rule is like an IF-THEN statement. The left-hand side of the rule shows an existing condition or set of conditions. The right-hand side of the rule shows the item in your data set that is usually associated with the conditions on the left side.

Note   If you want to extract either the rules or the itemsets, you can use a query to return only the node types that you want. For more information, see Querying an Association Model (Analysis Services - Data Mining).

Model Content for an Association Model

This section provides detail and examples only for those columns in the mining model content that are relevant for association models.

For information about the general-purpose columns in the schema rowset, such as MODEL_CATALOG and MODEL_NAME, see Mining Model Content (Analysis Services - Data Mining).

  • MODEL_CATALOG
    Name of the database where the model is stored.

  • MODEL_NAME
    Name of the model.

  • ATTRIBUTE_NAME
    The names of the attributes that correspond to this node.

  • NODE_NAME
    The name of the node. For an association model, this column contains the same value as NODE_UNIQUE_NAME.

  • NODE_UNIQUE_NAME
    The unique name of the node.

  • NODE_TYPE
    A association model outputs only the following node types:

    Node Type ID

    Type

    1 (Model)

    Root or parent node.

    7 (Itemset)

    An itemset, or collection of attribute-value pairs. Examples:

    Product 1 = Existing, Product 2 = Existing

    or

    Gender = Male.

    8 (Rule)

    A rule defining how items relate to each other.

    Example:

    Product 1 = Existing, Product 2 = Existing -> Product 3 = Existing.

  • NODE_CAPTION
    A label or a caption associated with the node.

    Itemset node   A comma-separated list of items.

    Rule node   Contains the left and right-hand sides of the rule.

  • CHILDREN_CARDINALITY
    Indicates the number of children of the current node.

    Parent node   Indicates the total number of itemsets plus rules.

    Note

    To get a breakdown of the count for itemsets and rules, see the NODE_DESCRIPTION for the root node of the model.

    Itemset or rule node   Always 0.

  • PARENT_UNIQUE_NAME
    The unique name of the node's parent.

    Parent node   Always NULL.

    Itemset or rule node   Always 0.

  • NODE_DESCRIPTION
    A user-friendly description of the contents of the node.

    Parent node   Includes a comma-separated list of the following information about the model:

    Item

    Description

    ITEMSET_COUNT

    Count of all itemsets in model.

    RULE_COUNT

    Count of all rules in model.

    MIN_SUPPORT

    The minimum support found for any single itemset.

    Note   This value might differ from the value that you set for the MINIMUM _SUPPORT parameter.

    MAX_SUPPORT

    The maximum support found for any single itemset.

    Note   This value might differ from the value that you set for the MAXIMUM_SUPPORT parameter.

    MIN_ITEMSET_SIZE

    The size of the smallest itemset, represented as a count of items.

    A value of 0 indicates that the Missing state was treated as an independent item.

    Note   The default value of the MINIMUM_ITEMSET_SIZE parameter is 1.

    MAX_ITEMSET_SIZE

    Indicates the size of the largest itemset that was found.

    Note   This value is constrained by the value that you set for the MAX_ITEMSET_SIZE parameter when you created the model. This value can never exceed that value; however, it can be less. The default value is 3.

    MIN_PROBABILITY

    The minimum probability detected for any single itemset or rule in the model.

    Example: 0.400390625

    Note   For itemsets, this value is always greater than the value that you set for the MINIMUM_PROBABILITY parameter when you created the model.

    MAX_PROBABILITY

    The maximum probability detected for any single itemset or rule in the model.

    Example: 1

    Note   There is no parameter to constrain maximum probability of itemsets. If you want to eliminate items that are too frequent, use the MAXIMUM_SUPPORT parameter instead.

    MIN_LIFT

    The minimum amount of lift that is provided by the model for any itemset.

    Example: 0.14309369632511

    NoteNote
    Knowing the minimum lift can help you determine whether the lift for any one itemset is significant.

    MAX_LIFT

    The maximum amount of lift that is provided by the model for any itemset.

    Example: 1.95758227647523 Note   Knowing the maximum lift can help you determine whether the lift for any one itemset is significant.

    Itemset node   Itemset nodes contain a list of the items, displayed as a comma-separated text string.

    Example:

    Touring Tire = Existing, Water Bottle = Existing

    This means touring tires and water bottles were purchased together.

    Rule node   Rule nodes contains a left-hand and right-hand side of the rule, separated by an arrow.

    Example: Touring Tire = Existing, Water Bottle = Existing -> Cycling cap = Existing

    This means that if someone bought a touring tire and a water bottle, they were also likely to buy a cycling cap.

  • NODE_RULE
    An XML fragment that describes the rule or itemset that is embedded in the node.

    Parent node   Blank.

    Itemset node   Blank.

    Rule node   The XML fragment includes additional useful information about the rule, such as support, confidence, and the number of items, and the ID of the node that represents the left-hand side of the rule.

  • MARGINAL_RULE
    Blank.

  • NODE_PROBABILITY
    A probability or confidence score associated with the itemset or rule.

    Parent node   Always 0.

    Itemset node   Probability of the itemset.

    Rule node   Confidence value for the rule.

  • MARGINAL_PROBABILITY
    Same as NODE_PROBABILITY.

  • NODE_DISTRIBUTION
    The table contains very different information, depending on whether the node is an itemset or a rule.

    Parent node   Blank.

    Itemset node   Lists each item in the itemset together with a probability and support value. For example, if the itemset contains two products, the name of each product is listed, together with the count of cases that include each product.

    Rule node   Contains two rows. The first row shows the attribute from the right-hand side of the rule, which is the predicted item, together with a confidence score.

    The second row is unique to association models; it contains a pointer to the itemset on the right-hand side of the rule. The pointer is represented in the ATTRIBUTE_VALUE column as the ID of the itemset that contains only the right-hand item.

    For example, if the rule is If {A,B} Then {C}, the table contains the name of the item {C}, and the ID of the node that contains the itemset for item C.

    This pointer is useful because you can determine from the itemset node how many cases in all include the right-hand side product. The cases that are subject to the rule If {A,B} Then {C} are a subset of the cases listed in the itemset for {C}.

  • NODE_SUPPORT
    The number of cases that support this node.

    Parent node   Number of cases in the model.

    Itemset node   Number of cases that contains all items in the itemset.

    Rule node   The number of cases that contain all items included in the rule.

  • MSOLAP_MODEL_COLUMN
    Contains different information depending on whether the node is an itemset or rule.

    Parent node   Blank.

    Itemset node   Blank.

    Rule node   The ID of the itemset that contains the items in the left-hand side of the rule. For example, if the rule is If {A,B} Then {C}, this column contains the ID of the itemset that contains only {A,B}.

  • MSOLAP_NODE_SCORE
    Parent node   Blank.

    Itemset node   Importance score for the itemset.

    Rule node   Importance score for the rule.

    Note

    Importance is calculated differently for itemsets and rules. For more information, see Microsoft Association Algorithm Technical Reference.

  • MSOLAP_NODE_SHORT_CAPTION
    Blank.