LightGbmRanker Class

Reference

Gradient Boosted Decision Trees

Inheritance: nimbusml.internal.core.ensemble._lightgbmranker.LightGbmRanker

LightGbmRanker

nimbusml.base_predictor.BasePredictor

LightGbmRanker

sklearn.base.ClassifierMixin

LightGbmRanker

Constructor

LightGbmRanker(number_of_iterations=100, learning_rate=None, number_of_leaves=None, minimum_example_count_per_leaf=None, booster=None, normalize='Auto', caching='Auto', custom_gains=[0, 3, 7, 15, 31, 63, 127, 255, 511, 1023, 2047, 4095], sigmoid=0.5, evaluation_metric='NormalizedDiscountedCumulativeGain', maximum_bin_count_per_feature=255, verbose=False, silent=True, number_of_threads=None, early_stopping_round=0, batch_size=1048576, use_categorical_split=None, handle_missing_value=True, minimum_example_count_per_group=100, maximum_categorical_split_point_count=32, categorical_smoothing=10.0, l2_categorical_regularization=10.0, random_state=None, parallel_trainer=None, feature=None, group_id=None, label=None, weight=None, **params)

Parameters

Name	Description
feature	see Columns.
group_id	see Columns.
label	see Columns.
weight	see Columns.
number_of_iterations	Number of iterations.
learning_rate	Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.
number_of_leaves	The maximum number of leaves (terminal nodes) that can be created in any tree. Higher values potentially increase the size of the tree and get better precision, but risk overfitting and requiring longer training times.
minimum_example_count_per_leaf	Minimum number of training instances required to form a leaf. That is, the minimal number of documents allowed in a leaf of regression tree, out of the sub-sampled data. A 'split' means that features in each level of the tree (node) are randomly divided.
booster	Which booster to use. Available options are: Dart Gbdt Goss.
normalize	If `Auto`, the choice to normalize depends on the preference declared by the algorithm. This is the default choice. If `No`, no normalization is performed. If `Yes`, normalization always performed. If `Warn`, if normalization is needed by the algorithm, a warning message is displayed but normalization is not performed. If normalization is performed, a `MaxMin` normalizer is used. This normalizer preserves sparsity by mapping zero to zero.
caching	Whether trainer should cache input training data.
custom_gains	An array of gains associated to each relevance label.
sigmoid	Parameter for the sigmoid function.
evaluation_metric	Evaluation metrics.
maximum_bin_count_per_feature	Maximum number of bucket bin for features.
verbose	Verbose.
silent	Printing running messages.
number_of_threads	Number of parallel threads used to run LightGBM.
early_stopping_round	Rounds of early stopping, 0 will disable it.
batch_size	Number of entries in a batch when loading data.
use_categorical_split	Enable categorical split or not.
handle_missing_value	Enable special handling of missing value or not.
minimum_example_count_per_group	Minimum number of instances per categorical group.
maximum_categorical_split_point_count	Max number of categorical thresholds.
categorical_smoothing	Lapalace smooth term in categorical feature spilt. Avoid the bias of small categories.
l2_categorical_regularization	L2 Regularization for categorical split.
random_state	Sets the random seed for LightGBM to use.
parallel_trainer	Parallel LightGBM Learning Algorithm.
params	Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # LightGbmRanker
   import numpy as np
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.ensemble import LightGbmRanker

   # data input (as a FileDataStream)
   path = get_dataset('gen_tickettrain').as_filepath()

   # LightGbmRanker requires key type for group column
   data = FileDataStream.read_csv(path, dtype={'group': np.uint32})

   # define the training pipeline
   pipeline = Pipeline([LightGbmRanker(
       feature=['Class', 'dep_day', 'duration'], label='rank', group_id='group')])

   # train, predict, and evaluate.
   metrics, predictions = pipeline \
       .fit(data) \
       .test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #       Score
   # 0 -0.124121
   # 1 -0.124121
   # 2 -0.124121
   # 3 -0.376062
   # 4 -0.376062
   # print evaluation metrics
   print(metrics)
   #       NDCG@1     NDCG@2     NDCG@3     DCG@1      DCG@2      DCG@3
   # 0  55.238095  65.967598  67.726087  6.492128  11.043324  13.928714

Remarks

Light GBM is an open source implementation of boosted trees. It is available in nimbusml as a binary classification trainer, a multi-class trainer, a regression trainer and a ranking trainer. Note that for this learner, we constraint the largest rank to be 12. Users might need to normalize their label columns for the rank, else may get "out of bound" errors.

Reference

GitHub: LightGBM

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name	Description
deep	Default value: False

Share via

LightGbmRanker Class

Constructor

Parameters

Examples

Remarks

Methods

get_params

Parameters

Additional resources