FastTreesTweedieRegressor Class

Machine Learning Fast Tree

Inheritance
nimbusml.internal.core.ensemble._fasttreestweedieregressor.FastTreesTweedieRegressor
FastTreesTweedieRegressor
nimbusml.base_predictor.BasePredictor
FastTreesTweedieRegressor
sklearn.base.RegressorMixin
FastTreesTweedieRegressor

Constructor

FastTreesTweedieRegressor(number_of_trees=100, number_of_leaves=20, minimum_example_count_per_leaf=10, learning_rate=0.2, normalize='Auto', caching='Auto', index=1.5, best_step_trees=False, use_line_search=False, maximum_number_of_line_search_steps=0, minimum_step_size=0.0, optimizer='GradientDescent', early_stopping_rule=None, early_stopping_metrics=1, enable_pruning=False, use_tolerant_pruning=False, pruning_threshold=0.004, pruning_window_size=5, shrinkage=1.0, dropout_rate=0.0, get_derivatives_sample_rate=1, write_last_ensemble=False, maximum_tree_output=100.0, random_start=False, filter_zero_lambdas=False, baseline_scores_formula=None, baseline_alpha_risk=None, position_discount_freeform=None, parallel_trainer=None, number_of_threads=None, random_state=123, feature_selection_seed=123, entropy_coefficient=0.0, histogram_pool_size=-1, disk_transpose=None, feature_flocks=True, categorical_split=False, maximum_categorical_group_count_per_node=64, maximum_categorical_split_point_count=64, minimum_example_fraction_for_categorical_split=0.001, minimum_examples_for_categorical_split=100, bias=0.0, bundling='None', maximum_bin_count_per_feature=255, sparsify_threshold=0.7, first_use_penalty=0.0, feature_reuse_penalty=0.0, gain_conf_level=0.0, softmax_temperature=0.0, execution_time=False, feature_fraction=1.0, bagging_size=0, bagging_example_fraction=0.7, feature_fraction_per_split=1.0, smoothing=0.0, allow_empty_trees=True, feature_compression_level=1, compress_ensemble=False, test_frequency=2147483647, feature=None, group_id=None, label=None, weight=None, **params)

Parameters

Name Description
feature

see Columns.

group_id

see Columns.

label

see Columns.

weight

see Columns.

number_of_trees

Specifies the total number of decision trees to create in the ensemble. By creating more decision trees, you can potentially get better coverage, but the training time increases.

number_of_leaves

The maximum number of leaves (terminal nodes) that can be created in any tree. Higher values potentially increase the size of the tree and get better precision, but risk overfitting and requiring longer training times.

minimum_example_count_per_leaf

Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided.

learning_rate

Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.

normalize

Specifies the type of automatic normalization used:

  • "Auto": if normalization is needed, it is performed automatically. This is the default choice.

  • "No": no normalization is performed.

  • "Yes": normalization is performed.

  • "Warn": if normalization is needed, a warning message is displayed, but normalization is not performed.

Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a MaxMin normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0 and 0 <= b <= 1 and b - a = 1. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

index

Index parameter for the Tweedie distribution, in the range [1, 2]. 1 is Poisson loss, 2 is gamma loss, and intermediate values are compound Poisson loss.

best_step_trees

Option for using best regression step trees.

use_line_search

Should we use line search for a step size.

maximum_number_of_line_search_steps

Number of post-bracket line search steps.

minimum_step_size

Minimum line search step size.

optimizer

Default is sgd.

early_stopping_rule

Early stopping rule. (Validation set (/valid) is required.).

early_stopping_metrics

Early stopping metrics. (For regression, 1: L1, 2:L2; for ranking, 1:NDCG@1, 3:NDCG@3).

enable_pruning

Enable post-training pruning to avoid overfitting. (a validation set is required).

use_tolerant_pruning

Use window and tolerance for pruning.

pruning_threshold

The tolerance threshold for pruning.

pruning_window_size

The moving window size for pruning.

shrinkage

Shrinkage.

dropout_rate

Dropout rate for tree regularization.

get_derivatives_sample_rate

Sample each query 1 in k times in the GetDerivatives function.

write_last_ensemble

Write the last ensemble instead of the one determined by early stopping.

maximum_tree_output

Upper bound on absolute value of single tree output.

random_start

Training starts from random ordering (determined by /r1).

filter_zero_lambdas

Filter zero lambdas during training.

baseline_scores_formula

Freeform defining the scores that should be used as the baseline ranker.

baseline_alpha_risk

Baseline alpha for tradeoffs of risk (0 is normal training).

position_discount_freeform

The discount freeform which specifies the per position discounts of examples in a query (uses a single variable P for position where P=0 is first position).

parallel_trainer

Allows to choose Parallel FastTree Learning Algorithm.

number_of_threads

The number of threads to use.

random_state

The seed of the random number generator.

feature_selection_seed

The seed of the active feature selection.

entropy_coefficient

The entropy (regularization) coefficient between 0 and 1.

histogram_pool_size

The number of histograms in the pool (between 2 and numLeaves).

disk_transpose

Whether to utilize the disk or the data's native transposition facilities (where applicable) when performing the transpose.

feature_flocks

Whether to collectivize features during dataset preparation to speed up training.

categorical_split

Whether to do split based on multiple categorical feature values.

maximum_categorical_group_count_per_node

Maximum categorical split groups to consider when splitting on a categorical feature. Split groups are a collection of split points. This is used to reduce overfitting when there many categorical features.

maximum_categorical_split_point_count

Maximum categorical split points to consider when splitting on a categorical feature.

minimum_example_fraction_for_categorical_split

Minimum categorical example percentage in a bin to consider for a split.

minimum_examples_for_categorical_split

Minimum categorical example count in a bin to consider for a split.

bias

Bias for calculating gradient for each feature bin for a categorical feature.

bundling

Bundle low population bins. Bundle.None(0): no bundling, Bundle.AggregateLowPopulation(1): Bundle low population, Bundle.Adjacent(2): Neighbor low population bundle.

maximum_bin_count_per_feature

Maximum number of distinct values (bins) per feature.

sparsify_threshold

Sparsity level needed to use sparse feature representation.

first_use_penalty

The feature first use penalty coefficient. This is a form of regularization that incurs a penalty for using a new feature when creating the tree. Increase this value to create trees that don't use many features.

feature_reuse_penalty

The feature re-use penalty (regularization) coefficient.

gain_conf_level

Tree fitting gain confidence requirement (should be in the range [0,1) ).

softmax_temperature

The temperature of the randomized softmax distribution for choosing the feature.

execution_time

Print execution time breakdown to stdout.

feature_fraction

The fraction of features (chosen randomly) to use on each iteration.

bagging_size

Number of trees in each bag (0 for disabling bagging).

bagging_example_fraction

Percentage of training examples used in each bag.

feature_fraction_per_split

The fraction of features (chosen randomly) to use on each split.

smoothing

Smoothing paramter for tree regularization.

allow_empty_trees

When a root split is impossible, allow training to proceed.

feature_compression_level

The level of feature compression to use.

compress_ensemble

Compress the tree Ensemble.

test_frequency

Calculate metric values for train/valid/test every k rounds.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # FastTreesTweedieRegressor
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.ensemble import FastTreesTweedieRegressor
   from nimbusml.feature_extraction.categorical import OneHotVectorizer

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()
   data = FileDataStream.read_csv(path)
   print(data.head())
   #   age  case education  induced  parity  ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       FastTreesTweedieRegressor(feature=['induced', 'edu'], label='age')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #       Score
   # 0  35.152565
   # 1  35.152565
   # 2  34.089958
   # 3  34.089958
   # 4  32.486031
   # print evaluation metrics
   print(metrics)
   #    L1(avg)    L2(avg)  RMS(avg)  Loss-fn(avg)  R Squared
   # 0  4.095883  24.048477  4.903925     24.048477   0.124482

Remarks

Trains gradient boosted decision trees to fit target values using a Tweedie loss function. This learner is a generalization of Poisson, compound Poisson, and gamma regression.

Reference

Wikipedia: Gradient boosting (Gradient tree boosting)

Greedy function approximation: A gradient boosting machine.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False