GamBinaryClassifier Class

Generalized Additive Models

Constructor

GamBinaryClassifier(number_of_iterations=9500, minimum_example_count_per_leaf=10, learning_rate=0.002, normalize='Auto', caching='Auto', unbalanced_sets=False, entropy_coefficient=0.0, gain_conf_level=0, number_of_threads=None, disk_transpose=None, maximum_bin_count_per_feature=255, maximum_tree_output=inf, get_derivatives_sample_rate=1, random_state=123, feature_flocks=True, enable_pruning=True, feature=None, label=None, weight=None, **params)

Parameters

Name	Description
feature	see Columns.
label	see Columns.
weight	see Columns.
number_of_iterations	Total number of iterations over all features.
minimum_example_count_per_leaf	Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided.
learning_rate	Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.
normalize	Specifies the type of automatic normalization used: `"Auto"`: if normalization is needed, it is performed automatically. This is the default choice. `"No"`: no normalization is performed. `"Yes"`: normalization is performed. `"Warn"`: if normalization is needed, a warning message is displayed, but normalization is not performed. Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a `MaxMin` normalizer is used. It normalizes values in an interval [a, b] where `-1 <= a <= 0` and `0 <= b <= 1` and `b - a = 1`. This normalizer preserves sparsity by mapping zero to zero.
caching	Whether trainer should cache input training data.
unbalanced_sets	Should we use derivatives optimized for unbalanced sets.
entropy_coefficient	The entropy (regularization) coefficient between 0 and 1.
gain_conf_level	Tree fitting gain confidence requirement (should be in the range [0,1) ).
number_of_threads	The number of threads to use.
disk_transpose	Whether to utilize the disk or the data's native transposition facilities (where applicable) when performing the transpose.
maximum_bin_count_per_feature	Maximum number of distinct values (bins) per feature.
maximum_tree_output	Upper bound on absolute value of single output.
get_derivatives_sample_rate	Sample each query 1 in k times in the GetDerivatives function.
random_state	The seed of the random number generator.
feature_flocks	Whether to collectivize features during dataset preparation to speed up training.
enable_pruning	Enable post-training pruning to avoid overfitting. (a validation set is required).
params	Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # GamBinaryClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.ensemble import GamBinaryClassifier
   from nimbusml.feature_extraction.categorical import OneHotVectorizer

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()
   data = FileDataStream.read_csv(path)
   print(data.head())
   #   age  case education  induced  parity  ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       GamBinaryClassifier(feature=['age', 'edu'], label='case')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel     Score
   # 0               0 -0.050461
   # 1               0 -0.049737
   # 2               0 -0.049737
   # 3               0 -0.050461
   # 4               0 -0.050552
   # print evaluation metrics
   print(metrics)
   #        AUC  Accuracy  Positive precision  Positive recall  ...
   # 0  0.502957  0.665323                   0                0  ...

Remarks

Generalized additive models (referred to throughout as GAM) is a class of models expressable as an independent sum of individual functions. nimbusml's GAM learner comes in both binary classification (using logit-boosting) and regression (using least squares) flavors.

In contrast to many formal definitions of GAM, this implementation found it convenient to represent learning over stepwise functions, which betrays the intention that GAM's components be smooth functions. In particular: the learner first discretizes features, and the "step" functions learned will step between the discretization boundaries.

This implementation is based on the this paper, but diverges from it in several important respects: most significantly, in each round of boosting, rather than do one feature at a time, it instead makes a round on all features simultaneously. In each round, it will choose only one split point of each feature to change.

In its current form, the GAM learner has the following advantages and disadvantages: on the one hand, they offer ready interpretability combined with expressive power, but on the other, they are currently slow. We would recommend their usage in the case where the key criteria is interpretability.

Let's talk a bit more about interpretabilty. The next most interpretable model, we might say, is a linear model. But really, let's say that you have a feature with a coefficient of 3.9293, or something. What do you know? You know that generally, perhaps, larger values for that feature are "better." Great. But is 4 better than 3? Is 5 better than 4? To what degree? Are there "shapes" in the distributions hidden because of the reduction of a complex quantity to a single values? These are questions a linear model fundamentally cannot answer, but a GAM model might.

Reference

Generalized additive models, Intelligible Models for Classification and Regression

Methods

decision_function	Returns score values
get_params	Get the parameters for this operator.
predict_proba	Returns probabilities

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name	Description
deep	Default value: False

predict_proba

Returns probabilities

predict_proba(X, **params)

Share via

GamBinaryClassifier Class

Constructor

Parameters

Examples

Remarks

Methods

decision_function

get_params

Parameters

predict_proba