GamBinaryClassifier Class
Generalized Additive Models
- Inheritance
-
nimbusml.internal.core.ensemble._gambinaryclassifier.GamBinaryClassifierGamBinaryClassifiernimbusml.base_predictor.BasePredictorGamBinaryClassifiersklearn.base.ClassifierMixinGamBinaryClassifier
Constructor
GamBinaryClassifier(number_of_iterations=9500, minimum_example_count_per_leaf=10, learning_rate=0.002, normalize='Auto', caching='Auto', unbalanced_sets=False, entropy_coefficient=0.0, gain_conf_level=0, number_of_threads=None, disk_transpose=None, maximum_bin_count_per_feature=255, maximum_tree_output=inf, get_derivatives_sample_rate=1, random_state=123, feature_flocks=True, enable_pruning=True, feature=None, label=None, weight=None, **params)
Parameters
Name | Description |
---|---|
feature
|
see Columns. |
label
|
see Columns. |
weight
|
see Columns. |
number_of_iterations
|
Total number of iterations over all features. |
minimum_example_count_per_leaf
|
Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided. |
learning_rate
|
Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution. |
normalize
|
Specifies the type of automatic normalization used:
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a |
caching
|
Whether trainer should cache input training data. |
unbalanced_sets
|
Should we use derivatives optimized for unbalanced sets. |
entropy_coefficient
|
The entropy (regularization) coefficient between 0 and 1. |
gain_conf_level
|
Tree fitting gain confidence requirement (should be in the range [0,1) ). |
number_of_threads
|
The number of threads to use. |
disk_transpose
|
Whether to utilize the disk or the data's native transposition facilities (where applicable) when performing the transpose. |
maximum_bin_count_per_feature
|
Maximum number of distinct values (bins) per feature. |
maximum_tree_output
|
Upper bound on absolute value of single output. |
get_derivatives_sample_rate
|
Sample each query 1 in k times in the GetDerivatives function. |
random_state
|
The seed of the random number generator. |
feature_flocks
|
Whether to collectivize features during dataset preparation to speed up training. |
enable_pruning
|
Enable post-training pruning to avoid overfitting. (a validation set is required). |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# GamBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import GamBinaryClassifier
from nimbusml.feature_extraction.categorical import OneHotVectorizer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
GamBinaryClassifier(feature=['age', 'edu'], label='case')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Score
# 0 0 -0.050461
# 1 0 -0.049737
# 2 0 -0.049737
# 3 0 -0.050461
# 4 0 -0.050552
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.502957 0.665323 0 0 ...
Remarks
Generalized additive models
(referred
to throughout as GAM) is a class of models expressable as an
independent
sum of individual functions. nimbusml
's GAM learner comes in both
binary
classification (using logit-boosting) and regression (using least
squares) flavors.
In contrast to many formal definitions of GAM, this implementation found it convenient to represent learning over stepwise functions, which betrays the intention that GAM's components be smooth functions. In particular: the learner first discretizes features, and the "step" functions learned will step between the discretization boundaries.
This implementation is based on the this paper, but diverges from it in several important respects: most significantly, in each round of boosting, rather than do one feature at a time, it instead makes a round on all features simultaneously. In each round, it will choose only one split point of each feature to change.
In its current form, the GAM learner has the following advantages and disadvantages: on the one hand, they offer ready interpretability combined with expressive power, but on the other, they are currently slow. We would recommend their usage in the case where the key criteria is interpretability.
Let's talk a bit more about interpretabilty. The next most interpretable model, we might say, is a linear model. But really, let's say that you have a feature with a coefficient of 3.9293, or something. What do you know? You know that generally, perhaps, larger values for that feature are "better." Great. But is 4 better than 3? Is 5 better than 4? To what degree? Are there "shapes" in the distributions hidden because of the reduction of a complex quantity to a single values? These are questions a linear model fundamentally cannot answer, but a GAM model might.
Reference
Generalized additive models, Intelligible Models for Classification and Regression
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|
predict_proba
Returns probabilities
predict_proba(X, **params)