SgdBinaryClassifier Class
Machine Learning Hogwild Stochastic Gradient Descent Binary Classifier
- Inheritance
-
nimbusml.internal.core.linear_model._sgdbinaryclassifier.SgdBinaryClassifierSgdBinaryClassifiernimbusml.base_predictor.BasePredictorSgdBinaryClassifiersklearn.base.ClassifierMixinSgdBinaryClassifier
Constructor
SgdBinaryClassifier(normalize='Auto', caching='Auto', loss='log', l2_regularization=1e-06, number_of_threads=None, convergence_tolerance=0.0001, number_of_iterations=20, initial_learning_rate=0.01, shuffle=True, positive_instance_weight=1.0, check_frequency=None, feature=None, label=None, weight=None, **params)
Parameters
- feature
see Columns.
- label
see Columns.
- weight
see Columns.
- normalize
Specifies the type of automatic normalization used:
"Auto"
: if normalization is needed, it is performed automatically. This is the default choice."No"
: no normalization is performed."Yes"
: normalization is performed."Warn"
: if normalization is needed, a warning message is displayed, but normalization is not performed.
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a MaxMin
normalizer
is
used. It normalizes values in an interval [a, b] where -1 <= a <= 0
and 0 <= b <= 1
and b - a = 1
. This normalizer preserves
sparsity by mapping zero to zero.
- caching
Whether trainer should cache input training data.
- loss
The default is Log. Other choices are Exp, Hinge, and SmoothedHinge. For more information, please see the documentation page about losses, Loss.
- l2_regularization
L2 Regularization constant.
- number_of_threads
Degree of lock-free parallelism. Defaults to automatic depending on data sparseness. Determinism not guaranteed.
- convergence_tolerance
Exponential moving averaged improvement tolerance for convergence.
- number_of_iterations
Maximum number of iterations; set to 1 to simulate online learning.
- initial_learning_rate
Initial learning rate (only used by SGD).
- shuffle
Shuffle data every epoch?.
- positive_instance_weight
Apply weight to the positive class, for imbalanced data.
- check_frequency
Convergence check frequency (in terms of number of iterations). Default equals number of threads.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# SgdBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.linear_model import SgdBinaryClassifier
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
SgdBinaryClassifier(feature=['parity', 'edu'], label='case')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 0 0.363427 -0.560521
# 1 0 0.378848 -0.494439
# 2 0 0.363427 -0.560521
# 3 0 0.369564 -0.534088
# 4 0 0.336350 -0.679603
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.497006 0.665323 0 0 ...
Remarks
The Stochastic Gradient Descent (SGD) is one of the most popular stochastic optimization procedure that can be integrated into several machine learning tasks to achieve state-of-the-art performance. The Hogwild SGD binary classification learner implements SGD for binary classification that supports multi-threading without any locking. If the associated optimization problem is sparse, then Hogwild SGD achieves a nearly optimal rate of convergence. For a detailed reference, please refer to https://arxiv.org/pdf/1106.5730v2.pdf.
Reference
https://arxiv.org/pdf/1106.5730v2.pdf
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep
predict_proba
Returns probabilities
predict_proba(X, **params)