SgdBinaryClassifier Class
Machine Learning Hogwild Stochastic Gradient Descent Binary Classifier
- Inheritance
-
nimbusml.internal.core.linear_model._sgdbinaryclassifier.SgdBinaryClassifierSgdBinaryClassifiernimbusml.base_predictor.BasePredictorSgdBinaryClassifiersklearn.base.ClassifierMixinSgdBinaryClassifier
Constructor
SgdBinaryClassifier(normalize='Auto', caching='Auto', loss='log', l2_regularization=1e-06, number_of_threads=None, convergence_tolerance=0.0001, number_of_iterations=20, initial_learning_rate=0.01, shuffle=True, positive_instance_weight=1.0, check_frequency=None, feature=None, label=None, weight=None, **params)
Parameters
Name | Description |
---|---|
feature
|
see Columns. |
label
|
see Columns. |
weight
|
see Columns. |
normalize
|
Specifies the type of automatic normalization used:
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a |
caching
|
Whether trainer should cache input training data. |
loss
|
The default is Log. Other choices are Exp, Hinge, and SmoothedHinge. For more information, please see the documentation page about losses, Loss. |
l2_regularization
|
L2 Regularization constant. |
number_of_threads
|
Degree of lock-free parallelism. Defaults to automatic depending on data sparseness. Determinism not guaranteed. |
convergence_tolerance
|
Exponential moving averaged improvement tolerance for convergence. |
number_of_iterations
|
Maximum number of iterations; set to 1 to simulate online learning. |
initial_learning_rate
|
Initial learning rate (only used by SGD). |
shuffle
|
Shuffle data every epoch?. |
positive_instance_weight
|
Apply weight to the positive class, for imbalanced data. |
check_frequency
|
Convergence check frequency (in terms of number of iterations). Default equals number of threads. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# SgdBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.linear_model import SgdBinaryClassifier
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
SgdBinaryClassifier(feature=['parity', 'edu'], label='case')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 0 0.363427 -0.560521
# 1 0 0.378848 -0.494439
# 2 0 0.363427 -0.560521
# 3 0 0.369564 -0.534088
# 4 0 0.336350 -0.679603
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.497006 0.665323 0 0 ...
Remarks
The Stochastic Gradient Descent (SGD) is one of the most popular stochastic optimization procedure that can be integrated into several machine learning tasks to achieve state-of-the-art performance. The Hogwild SGD binary classification learner implements SGD for binary classification that supports multi-threading without any locking. If the associated optimization problem is sparse, then Hogwild SGD achieves a nearly optimal rate of convergence. For a detailed reference, please refer to https://arxiv.org/pdf/1106.5730v2.pdf.
Reference
https://arxiv.org/pdf/1106.5730v2.pdf
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|
predict_proba
Returns probabilities
predict_proba(X, **params)