FastTreesTweedieRegressor Class
Machine Learning Fast Tree
- Inheritance
-
nimbusml.internal.core.ensemble._fasttreestweedieregressor.FastTreesTweedieRegressorFastTreesTweedieRegressornimbusml.base_predictor.BasePredictorFastTreesTweedieRegressorsklearn.base.RegressorMixinFastTreesTweedieRegressor
Constructor
FastTreesTweedieRegressor(number_of_trees=100, number_of_leaves=20, minimum_example_count_per_leaf=10, learning_rate=0.2, normalize='Auto', caching='Auto', index=1.5, best_step_trees=False, use_line_search=False, maximum_number_of_line_search_steps=0, minimum_step_size=0.0, optimizer='GradientDescent', early_stopping_rule=None, early_stopping_metrics=1, enable_pruning=False, use_tolerant_pruning=False, pruning_threshold=0.004, pruning_window_size=5, shrinkage=1.0, dropout_rate=0.0, get_derivatives_sample_rate=1, write_last_ensemble=False, maximum_tree_output=100.0, random_start=False, filter_zero_lambdas=False, baseline_scores_formula=None, baseline_alpha_risk=None, position_discount_freeform=None, parallel_trainer=None, number_of_threads=None, random_state=123, feature_selection_seed=123, entropy_coefficient=0.0, histogram_pool_size=-1, disk_transpose=None, feature_flocks=True, categorical_split=False, maximum_categorical_group_count_per_node=64, maximum_categorical_split_point_count=64, minimum_example_fraction_for_categorical_split=0.001, minimum_examples_for_categorical_split=100, bias=0.0, bundling='None', maximum_bin_count_per_feature=255, sparsify_threshold=0.7, first_use_penalty=0.0, feature_reuse_penalty=0.0, gain_conf_level=0.0, softmax_temperature=0.0, execution_time=False, feature_fraction=1.0, bagging_size=0, bagging_example_fraction=0.7, feature_fraction_per_split=1.0, smoothing=0.0, allow_empty_trees=True, feature_compression_level=1, compress_ensemble=False, test_frequency=2147483647, feature=None, group_id=None, label=None, weight=None, **params)
Parameters
Name | Description |
---|---|
feature
|
see Columns. |
group_id
|
see Columns. |
label
|
see Columns. |
weight
|
see Columns. |
number_of_trees
|
Specifies the total number of decision trees to create in the ensemble. By creating more decision trees, you can potentially get better coverage, but the training time increases. |
number_of_leaves
|
The maximum number of leaves (terminal nodes) that can be created in any tree. Higher values potentially increase the size of the tree and get better precision, but risk overfitting and requiring longer training times. |
minimum_example_count_per_leaf
|
Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided. |
learning_rate
|
Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution. |
normalize
|
Specifies the type of automatic normalization used:
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a |
caching
|
Whether trainer should cache input training data. |
index
|
Index parameter for the Tweedie distribution, in the range [1, 2]. 1 is Poisson loss, 2 is gamma loss, and intermediate values are compound Poisson loss. |
best_step_trees
|
Option for using best regression step trees. |
use_line_search
|
Should we use line search for a step size. |
maximum_number_of_line_search_steps
|
Number of post-bracket line search steps. |
minimum_step_size
|
Minimum line search step size. |
optimizer
|
Default is |
early_stopping_rule
|
Early stopping rule. (Validation set (/valid) is required.). |
early_stopping_metrics
|
Early stopping metrics. (For regression, 1: L1, 2:L2; for ranking, 1:NDCG@1, 3:NDCG@3). |
enable_pruning
|
Enable post-training pruning to avoid overfitting. (a validation set is required). |
use_tolerant_pruning
|
Use window and tolerance for pruning. |
pruning_threshold
|
The tolerance threshold for pruning. |
pruning_window_size
|
The moving window size for pruning. |
shrinkage
|
Shrinkage. |
dropout_rate
|
Dropout rate for tree regularization. |
get_derivatives_sample_rate
|
Sample each query 1 in k times in the GetDerivatives function. |
write_last_ensemble
|
Write the last ensemble instead of the one determined by early stopping. |
maximum_tree_output
|
Upper bound on absolute value of single tree output. |
random_start
|
Training starts from random ordering (determined by /r1). |
filter_zero_lambdas
|
Filter zero lambdas during training. |
baseline_scores_formula
|
Freeform defining the scores that should be used as the baseline ranker. |
baseline_alpha_risk
|
Baseline alpha for tradeoffs of risk (0 is normal training). |
position_discount_freeform
|
The discount freeform which specifies the per position discounts of examples in a query (uses a single variable P for position where P=0 is first position). |
parallel_trainer
|
Allows to choose Parallel FastTree Learning Algorithm. |
number_of_threads
|
The number of threads to use. |
random_state
|
The seed of the random number generator. |
feature_selection_seed
|
The seed of the active feature selection. |
entropy_coefficient
|
The entropy (regularization) coefficient between 0 and 1. |
histogram_pool_size
|
The number of histograms in the pool (between 2 and numLeaves). |
disk_transpose
|
Whether to utilize the disk or the data's native transposition facilities (where applicable) when performing the transpose. |
feature_flocks
|
Whether to collectivize features during dataset preparation to speed up training. |
categorical_split
|
Whether to do split based on multiple categorical feature values. |
maximum_categorical_group_count_per_node
|
Maximum categorical split groups to consider when splitting on a categorical feature. Split groups are a collection of split points. This is used to reduce overfitting when there many categorical features. |
maximum_categorical_split_point_count
|
Maximum categorical split points to consider when splitting on a categorical feature. |
minimum_example_fraction_for_categorical_split
|
Minimum categorical example percentage in a bin to consider for a split. |
minimum_examples_for_categorical_split
|
Minimum categorical example count in a bin to consider for a split. |
bias
|
Bias for calculating gradient for each feature bin for a categorical feature. |
bundling
|
Bundle low population bins. Bundle.None(0): no bundling, Bundle.AggregateLowPopulation(1): Bundle low population, Bundle.Adjacent(2): Neighbor low population bundle. |
maximum_bin_count_per_feature
|
Maximum number of distinct values (bins) per feature. |
sparsify_threshold
|
Sparsity level needed to use sparse feature representation. |
first_use_penalty
|
The feature first use penalty coefficient. This is a form of regularization that incurs a penalty for using a new feature when creating the tree. Increase this value to create trees that don't use many features. |
feature_reuse_penalty
|
The feature re-use penalty (regularization) coefficient. |
gain_conf_level
|
Tree fitting gain confidence requirement (should be in the range [0,1) ). |
softmax_temperature
|
The temperature of the randomized softmax distribution for choosing the feature. |
execution_time
|
Print execution time breakdown to stdout. |
feature_fraction
|
The fraction of features (chosen randomly) to use on each iteration. |
bagging_size
|
Number of trees in each bag (0 for disabling bagging). |
bagging_example_fraction
|
Percentage of training examples used in each bag. |
feature_fraction_per_split
|
The fraction of features (chosen randomly) to use on each split. |
smoothing
|
Smoothing paramter for tree regularization. |
allow_empty_trees
|
When a root split is impossible, allow training to proceed. |
feature_compression_level
|
The level of feature compression to use. |
compress_ensemble
|
Compress the tree Ensemble. |
test_frequency
|
Calculate metric values for train/valid/test every k rounds. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# FastTreesTweedieRegressor
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.ensemble import FastTreesTweedieRegressor
from nimbusml.feature_extraction.categorical import OneHotVectorizer
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
FastTreesTweedieRegressor(feature=['induced', 'edu'], label='age')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# Score
# 0 35.152565
# 1 35.152565
# 2 34.089958
# 3 34.089958
# 4 32.486031
# print evaluation metrics
print(metrics)
# L1(avg) L2(avg) RMS(avg) Loss-fn(avg) R Squared
# 0 4.095883 24.048477 4.903925 24.048477 0.124482
Remarks
Trains gradient boosted decision trees to fit target values using a Tweedie loss function. This learner is a generalization of Poisson, compound Poisson, and gamma regression.
Reference
Wikipedia: Gradient boosting (Gradient tree boosting)
Greedy function approximation: A gradient boosting machine.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|