FastTreesBinaryClassifier Class
Machine Learning Fast Tree
- Inheritance
-
nimbusml.internal.core.ensemble._fasttreesbinaryclassifier.FastTreesBinaryClassifierFastTreesBinaryClassifiernimbusml.base_predictor.BasePredictorFastTreesBinaryClassifiersklearn.base.ClassifierMixinFastTreesBinaryClassifier
Constructor
FastTreesBinaryClassifier(number_of_trees=100, number_of_leaves=20, minimum_example_count_per_leaf=10, learning_rate=0.2, normalize='Auto', caching='Auto', unbalanced_sets=False, best_step_trees=False, use_line_search=False, maximum_number_of_line_search_steps=0, minimum_step_size=0.0, optimizer='GradientDescent', early_stopping_rule=None, early_stopping_metrics=1, enable_pruning=False, use_tolerant_pruning=False, pruning_threshold=0.004, pruning_window_size=5, shrinkage=1.0, dropout_rate=0.0, get_derivatives_sample_rate=1, write_last_ensemble=False, maximum_tree_output=100.0, random_start=False, filter_zero_lambdas=False, baseline_scores_formula=None, baseline_alpha_risk=None, position_discount_freeform=None, parallel_trainer=None, number_of_threads=None, random_state=123, feature_selection_seed=123, entropy_coefficient=0.0, histogram_pool_size=-1, disk_transpose=None, feature_flocks=True, categorical_split=False, maximum_categorical_group_count_per_node=64, maximum_categorical_split_point_count=64, minimum_example_fraction_for_categorical_split=0.001, minimum_examples_for_categorical_split=100, bias=0.0, bundling='None', maximum_bin_count_per_feature=255, sparsify_threshold=0.7, first_use_penalty=0.0, feature_reuse_penalty=0.0, gain_conf_level=0.0, softmax_temperature=0.0, execution_time=False, feature_fraction=1.0, bagging_size=0, bagging_example_fraction=0.7, feature_fraction_per_split=1.0, smoothing=0.0, allow_empty_trees=True, feature_compression_level=1, compress_ensemble=False, test_frequency=2147483647, feature=None, group_id=None, label=None, weight=None, **params)
Parameters
- feature
see Columns.
- group_id
see Columns.
- label
see Columns.
- weight
see Columns.
- number_of_trees
Specifies the total number of decision trees to create in the ensemble. By creating more decision trees, you can potentially get better coverage, but the training time increases.
- number_of_leaves
The maximum number of leaves (terminal nodes) that can be created in any tree. Higher values potentially increase the size of the tree and get better precision, but risk overfitting and requiring longer training times.
- minimum_example_count_per_leaf
Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided.
- learning_rate
Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.
- normalize
If Auto
, the choice to normalize depends on the
preference declared by the algorithm. This is the default choice. If
No
, no normalization is performed. If Yes
, normalization always
performed. If Warn
, if normalization is needed by the algorithm, a
warning message is displayed but normalization is not performed. If
normalization is performed, a MaxMin
normalizer is used. This
normalizer preserves sparsity by mapping zero to zero.
- caching
Whether trainer should cache input training data.
- unbalanced_sets
Option for using derivatives optimized for unbalanced sets.
- best_step_trees
Option for using best regression step trees.
- use_line_search
Should we use line search for a step size.
- maximum_number_of_line_search_steps
Number of post-bracket line search steps.
- minimum_step_size
Minimum line search step size.
- optimizer
Default is sgd
.
- early_stopping_rule
Early stopping rule. (Validation set (/valid) is required.).
- early_stopping_metrics
Early stopping metrics. (For regression, 1: L1, 2:L2; for ranking, 1:NDCG@1, 3:NDCG@3).
- enable_pruning
Enable post-training pruning to avoid overfitting. (a validation set is required).
- use_tolerant_pruning
Use window and tolerance for pruning.
- pruning_threshold
The tolerance threshold for pruning.
- pruning_window_size
The moving window size for pruning.
- shrinkage
Shrinkage.
- dropout_rate
Dropout rate for tree regularization.
- get_derivatives_sample_rate
Sample each query 1 in k times in the GetDerivatives function.
- write_last_ensemble
Write the last ensemble instead of the one determined by early stopping.
- maximum_tree_output
Upper bound on absolute value of single tree output.
- random_start
Training starts from random ordering (determined by /r1).
- filter_zero_lambdas
Filter zero lambdas during training.
- baseline_scores_formula
Freeform defining the scores that should be used as the baseline ranker.
- baseline_alpha_risk
Baseline alpha for tradeoffs of risk (0 is normal training).
- position_discount_freeform
The discount freeform which specifies the per position discounts of examples in a query (uses a single variable P for position where P=0 is first position).
- parallel_trainer
Allows to choose Parallel FastTree Learning Algorithm.
- number_of_threads
The number of threads to use.
- random_state
The seed of the random number generator.
- feature_selection_seed
The seed of the active feature selection.
- entropy_coefficient
The entropy (regularization) coefficient between 0 and 1.
- histogram_pool_size
The number of histograms in the pool (between 2 and numLeaves).
- disk_transpose
Whether to utilize the disk or the data's native transposition facilities (where applicable) when performing the transpose.
- feature_flocks
Whether to collectivize features during dataset preparation to speed up training.
- categorical_split
Whether to do split based on multiple categorical feature values.
- maximum_categorical_group_count_per_node
Maximum categorical split groups to consider when splitting on a categorical feature. Split groups are a collection of split points. This is used to reduce overfitting when there many categorical features.
- maximum_categorical_split_point_count
Maximum categorical split points to consider when splitting on a categorical feature.
- minimum_example_fraction_for_categorical_split
Minimum categorical example percentage in a bin to consider for a split.
- minimum_examples_for_categorical_split
Minimum categorical example count in a bin to consider for a split.
- bias
Bias for calculating gradient for each feature bin for a categorical feature.
- bundling
Bundle low population bins. Bundle.None(0): no bundling, Bundle.AggregateLowPopulation(1): Bundle low population, Bundle.Adjacent(2): Neighbor low population bundle.
- maximum_bin_count_per_feature
Maximum number of distinct values (bins) per feature.
- sparsify_threshold
Sparsity level needed to use sparse feature representation.
- first_use_penalty
The feature first use penalty coefficient. This is a form of regularization that incurs a penalty for using a new feature when creating the tree. Increase this value to create trees that don't use many features.
- feature_reuse_penalty
The feature re-use penalty (regularization) coefficient.
- gain_conf_level
Tree fitting gain confidence requirement (should be in the range [0,1) ).
- softmax_temperature
The temperature of the randomized softmax distribution for choosing the feature.
- execution_time
Print execution time breakdown to stdout.
- feature_fraction
The fraction of features (chosen randomly) to use on each iteration.
- bagging_size
Number of trees in each bag (0 for disabling bagging).
- bagging_example_fraction
Percentage of training examples used in each bag.
- feature_fraction_per_split
The fraction of features (chosen randomly) to use on each split.
- smoothing
Smoothing paramter for tree regularization.
- allow_empty_trees
When a root split is impossible, allow training to proceed.
- feature_compression_level
The level of feature compression to use.
- compress_ensemble
Compress the tree Ensemble.
- test_frequency
Calculate metric values for train/valid/test every k rounds.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# FastTreesBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesBinaryClassifier
from nimbusml.feature_extraction.categorical import OneHotVectorizer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
FastTreesBinaryClassifier(feature=['age', 'edu'], label='case')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 0 0.330738 -1.762120
# 1 0 0.337897 -1.681700
# 2 0 0.334428 -1.720559
# 3 0 0.331255 -1.756292
# 4 0 0.333299 -1.733252
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.502957 0.665323 0 0 ...
Remarks
FastTreesBinaryClassifier is an implementation of FastRank. FastRank is an efficient implementation of the MART gradient boosting algorithm. Gradient boosting is a machine learning technique for regression problems. It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error for each step and corrects for it in the next. So this prediction model is actually an ensemble of weaker prediction models. In regression problems, boosting builds a series of of such trees in a step-wise fashion and then selects the optimal tree using an arbitrary differentiable loss function.
MART learns an ensemble of regression trees, which is a decision tree
with scalar values in its leaves. A decision (or regression) tree is
a
binary tree-like flow chart, where at each interior node one decides
which of the two child nodes to continue to based on one of the
feature
values from the input. At each leaf node, a value is returned. In the
interior nodes, the decision is based on the test "x <= v"
, where
x
is the value of the feature in the input sample and v
is
one
of the possible values of this feature. The functions that can be
produced by a regression tree are all the piece-wise constant
functions.
The ensemble of trees is produced by computing, in each step, a regression tree that approximates the gradient of the loss function, and adding it to the previous tree with coefficients that minimize the loss of the new tree. The output of the ensemble produced by MART on a given instance is the sum of the tree outputs.
- In case of a binary classification problem, the output is converted
to a probability by using some form of calibration.
- In case of a regression problem, the output is the predicted value
of the function.
- In case of a ranking problem, the instances are ordered by the
output value of the ensemble.
Reference
Wikipedia: Gradient boosting (Gradient tree boosting)
Greedy function approximation: A gradient boosting machine.
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep
predict_proba
Returns probabilities
predict_proba(X, **params)