FastLinearBinaryClassifier Class

A Stochastic Dual Coordinate Ascent (SDCA) optimization trainer for linear binary classification.

Inheritance
nimbusml.internal.core.linear_model._fastlinearbinaryclassifier.FastLinearBinaryClassifier
FastLinearBinaryClassifier
nimbusml.base_predictor.BasePredictor
FastLinearBinaryClassifier
sklearn.base.ClassifierMixin
FastLinearBinaryClassifier

Constructor

FastLinearBinaryClassifier(l2_regularization=None, l1_threshold=None, normalize='Auto', caching='Auto', loss='log', number_of_threads=None, positive_instance_weight=1.0, convergence_tolerance=0.1, maximum_number_of_iterations=None, shuffle=True, convergence_check_frequency=None, bias_learning_rate=0.0, feature=None, label=None, weight=None, **params)

Parameters

Name Description
feature

see Columns.

label

see Columns.

weight

see Columns.

l2_regularization

L2 regularizer constant. By default the l2 constant is automatically inferred based on data set.

l1_threshold

L1 soft threshold (L1/L2). Note that it is easier to control and sweep using the threshold parameter than the raw L1-regularizer constant. By default the l1 threshold is automatically inferred based on data set.

normalize

Specifies the type of automatic normalization used:

  • "Auto": if normalization is needed, it is performed automatically. This is the default choice.

  • "No": no normalization is performed.

  • "Yes": normalization is performed.

  • "Warn": if normalization is needed, a warning message is displayed, but normalization is not performed.

Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a MaxMin normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0 and 0 <= b <= 1 and b - a = 1. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

loss

The default is Log. Other choices are Hinge, and SmoothedHinge. For more information, please see the documentation page about losses, Loss.

number_of_threads

Degree of lock-free parallelism. Defaults to automatic. Determinism not guaranteed.

positive_instance_weight

Apply weight to the positive class, for imbalanced data.

convergence_tolerance

The tolerance for the ratio between duality gap and primal loss for convergence checking.

maximum_number_of_iterations

Maximum number of iterations; set to 1 to simulate online learning. Defaults to automatic.

shuffle

Shuffle data every epoch?.

convergence_check_frequency

Convergence check frequency (in terms of number of iterations). Set as negative or zero for not checking at all. If left blank, it defaults to check after every 'numThreads' iterations.

bias_learning_rate

The learning rate for adjusting bias from being regularized.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # FastLinearBinaryClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.categorical import OneHotVectorizer
   from nimbusml.linear_model import FastLinearBinaryClassifier

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()
   data = FileDataStream.read_csv(path)
   print(data.head())
   #   age  case education  induced  parity  ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       FastLinearBinaryClassifier(feature=['age', 'edu', 'induced'], label='case')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel  Probability     Score
   # 0               0     0.360707 -0.572296
   # 1               0     0.372283 -0.522437
   # 2               0     0.376590 -0.504046
   # 3               0     0.372939 -0.519627
   # 4               0     0.329059 -0.712444

   # print evaluation metrics
   print(metrics)
   #        AUC  Accuracy  Positive precision  Positive recall  ...
   # 0  0.496495  0.665323                   0                0  ...

Remarks

FastLinearBinaryClassifier is a trainer based on the Stochastic Dual Coordinate Ascent (SDCA) method, a state-of-the-art optimization technique for convex objective functions. The algorithm can be scaled for use on large out-of-memory data sets due to a semi-asynchronized implementation that supports multi-threading. Convergence is underwritten by periodically enforcing synchronization between primal and dual updates in a separate thread. Several choices of loss functions are also provided. The SDCA method combines several of the best properties and capabilities of logistic regression and SVM algorithms. For more information on SDCA, see the citations in the reference section.

Traditional optimization algorithms, such as stochastic gradient descent (SGD), optimize the empirical loss function directly. The SDCA chooses a different approach that optimizes the dual problem instead. The dual loss function is parameterized by per-example weights. In each iteration, when a training example from the training data set is read, the corresponding example weight is adjusted so that the dual loss function is optimized with respect to the current example. No learning rate is needed by SDCA to determine step size as is required by various gradient descent methods.

FastLinearBinaryClassifier supports binary classification with three types of loss functions currently: Log loss, hinge loss, and smoothed hinge loss. Elastic net regularization can be specified by the l2_weight and l1_threshold parameters. Note that the l2_weight has an effect on the rate of convergence. In general, the larger the l2_weight, the faster SDCA converges.

Note that FastLinearBinaryClassifier is a stochastic and streaming optimization algorithm. The results depends on the order of the training data. For reproducible results, it is recommended that one sets shuffle to False and number_of_threads to 1.

Reference

Scaling Up Stochastic Dual Coordinate Ascent

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Methods

decision_function

Returns score values

get_params

Get the parameters for this operator.

predict_proba

Returns probabilities

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name Description
deep
Default value: False

predict_proba

Returns probabilities

predict_proba(X, **params)