FastLinearBinaryClassifier Class
A Stochastic Dual Coordinate Ascent (SDCA) optimization trainer for linear binary classification.
- Inheritance
-
nimbusml.internal.core.linear_model._fastlinearbinaryclassifier.FastLinearBinaryClassifierFastLinearBinaryClassifiernimbusml.base_predictor.BasePredictorFastLinearBinaryClassifiersklearn.base.ClassifierMixinFastLinearBinaryClassifier
Constructor
FastLinearBinaryClassifier(l2_regularization=None, l1_threshold=None, normalize='Auto', caching='Auto', loss='log', number_of_threads=None, positive_instance_weight=1.0, convergence_tolerance=0.1, maximum_number_of_iterations=None, shuffle=True, convergence_check_frequency=None, bias_learning_rate=0.0, feature=None, label=None, weight=None, **params)
Parameters
Name | Description |
---|---|
feature
|
see Columns. |
label
|
see Columns. |
weight
|
see Columns. |
l2_regularization
|
L2 regularizer constant. By default the l2 constant is automatically inferred based on data set. |
l1_threshold
|
L1 soft threshold (L1/L2). Note that it is easier to control and sweep using the threshold parameter than the raw L1-regularizer constant. By default the l1 threshold is automatically inferred based on data set. |
normalize
|
Specifies the type of automatic normalization used:
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a |
caching
|
Whether trainer should cache input training data. |
loss
|
The default is Log. Other choices are Hinge, and SmoothedHinge. For more information, please see the documentation page about losses, Loss. |
number_of_threads
|
Degree of lock-free parallelism. Defaults to automatic. Determinism not guaranteed. |
positive_instance_weight
|
Apply weight to the positive class, for imbalanced data. |
convergence_tolerance
|
The tolerance for the ratio between duality gap and primal loss for convergence checking. |
maximum_number_of_iterations
|
Maximum number of iterations; set to 1 to simulate online learning. Defaults to automatic. |
shuffle
|
Shuffle data every epoch?. |
convergence_check_frequency
|
Convergence check frequency (in terms of number of iterations). Set as negative or zero for not checking at all. If left blank, it defaults to check after every 'numThreads' iterations. |
bias_learning_rate
|
The learning rate for adjusting bias from being regularized. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# FastLinearBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.linear_model import FastLinearBinaryClassifier
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
FastLinearBinaryClassifier(feature=['age', 'edu', 'induced'], label='case')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 0 0.360707 -0.572296
# 1 0 0.372283 -0.522437
# 2 0 0.376590 -0.504046
# 3 0 0.372939 -0.519627
# 4 0 0.329059 -0.712444
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.496495 0.665323 0 0 ...
Remarks
FastLinearBinaryClassifier
is a trainer based on the Stochastic
Dual
Coordinate Ascent (SDCA) method, a state-of-the-art optimization
technique for convex objective functions. The algorithm can be scaled
for use on large out-of-memory data sets due to a semi-asynchronized
implementation that supports multi-threading. Convergence is
underwritten by periodically enforcing synchronization between primal
and dual updates in a separate thread. Several choices of loss
functions
are also provided. The SDCA method combines several of the best
properties and capabilities of logistic regression and SVM
algorithms.
For more information on SDCA, see the citations in the reference
section.
Traditional optimization algorithms, such as stochastic gradient descent (SGD), optimize the empirical loss function directly. The SDCA chooses a different approach that optimizes the dual problem instead. The dual loss function is parameterized by per-example weights. In each iteration, when a training example from the training data set is read, the corresponding example weight is adjusted so that the dual loss function is optimized with respect to the current example. No learning rate is needed by SDCA to determine step size as is required by various gradient descent methods.
FastLinearBinaryClassifier
supports binary classification with
three
types of loss functions currently: Log loss, hinge loss, and smoothed
hinge loss. Elastic net regularization can be specified by the
l2_weight
and l1_threshold
parameters. Note that the
l2_weight
has an effect on the rate of convergence. In general, the larger the
l2_weight
, the faster SDCA converges.
Note that FastLinearBinaryClassifier
is a stochastic and
streaming
optimization algorithm. The results depends on the order of the
training
data. For reproducible results, it is recommended that one sets
shuffle
to False
and number_of_threads
to 1
.
Reference
Scaling Up Stochastic Dual Coordinate Ascent
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|
predict_proba
Returns probabilities
predict_proba(X, **params)