Goss Class
Gradient-based One-Side Sampling.
- Inheritance
-
nimbusml.internal.core.ensemble.booster._goss.GossGoss
Constructor
Goss(top_rate=0.2, other_rate=0.1, minimum_split_gain=0.0, maximum_tree_depth=0, minimum_child_weight=0.1, subsample_frequency=0, subsample_fraction=1.0, feature_fraction=1.0, l2_regularization=0.01, l1_regularization=0.0, **params)
Parameters
- top_rate
Retain ratio for large gradient instances.
- other_rate
Retain ratio for small gradient instances.
- minimum_split_gain
Minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.
- maximum_tree_depth
Maximum depth of a tree. 0 means no limit. However, tree still grows by best-first.
- minimum_child_weight
Minimum sum of instance weight(hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than min_child_weight, then the building process will give up further partitioning. In linear regression mode, this simply corresponds to minimum number of instances needed to be in each node. The larger, the more conservative the algorithm will be.
- subsample_frequency
Subsample frequency for bagging. 0 means no subsample. Specifies the frequency at which the bagging occurs, where if this is set to N, the subsampling will happen at every N iterations.This must be set with Subsample as this specifies the amount to subsample.
- subsample_fraction
Subsample ratio of the training instance. Setting it to 0.5 means that LightGBM randomly collected half of the data instances to grow trees and this will prevent overfitting. Range: (0,1].
- feature_fraction
Subsample ratio of columns when constructing each tree. Range: (0,1].
- l2_regularization
L2 regularization term on weights, increasing this value will make model more conservative.
- l1_regularization
L1 regularization term on weights, increase this value will make model more conservative.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# LightGbmBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import LightGbmBinaryClassifier
from nimbusml.ensemble.booster import Goss
from nimbusml.feature_extraction.categorical import OneHotVectorizer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
LightGbmBinaryClassifier(feature=['induced', 'edu'], label='case',
booster=Goss(top_rate=0.9))
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(
data, 'case').test(
data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 1 0.612220 0.913309
# 1 1 0.612220 0.913309
# 2 0 0.334486 -1.375929
# 3 0 0.334486 -1.375929
# 4 0 0.421264 -0.635176
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.626433 0.677419 0.588235 0.120482 ...
Remarks
Gradient-based One-Side Sampling (GOSS) employs an adaptive sampling named gradient-based sampling. For datasets with large sample size, GOSS has considerable advantage in terms of statistical and computational efficiency.
Reference
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep