SgdBinaryClassifier Class

Reference

Machine Learning Hogwild Stochastic Gradient Descent Binary Classifier

Inheritance: nimbusml.internal.core.linear_model._sgdbinaryclassifier.SgdBinaryClassifier

SgdBinaryClassifier

nimbusml.base_predictor.BasePredictor

SgdBinaryClassifier

sklearn.base.ClassifierMixin

SgdBinaryClassifier

Constructor

SgdBinaryClassifier(normalize='Auto', caching='Auto', loss='log', l2_regularization=1e-06, number_of_threads=None, convergence_tolerance=0.0001, number_of_iterations=20, initial_learning_rate=0.01, shuffle=True, positive_instance_weight=1.0, check_frequency=None, feature=None, label=None, weight=None, **params)

Parameters

Name	Description
feature	see Columns.
label	see Columns.
weight	see Columns.
normalize	Specifies the type of automatic normalization used: `"Auto"`: if normalization is needed, it is performed automatically. This is the default choice. `"No"`: no normalization is performed. `"Yes"`: normalization is performed. `"Warn"`: if normalization is needed, a warning message is displayed, but normalization is not performed. Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a `MaxMin` normalizer is used. It normalizes values in an interval [a, b] where `-1 <= a <= 0` and `0 <= b <= 1` and `b - a = 1`. This normalizer preserves sparsity by mapping zero to zero.
caching	Whether trainer should cache input training data.
loss	The default is Log. Other choices are Exp, Hinge, and SmoothedHinge. For more information, please see the documentation page about losses, Loss.
l2_regularization	L2 Regularization constant.
number_of_threads	Degree of lock-free parallelism. Defaults to automatic depending on data sparseness. Determinism not guaranteed.
convergence_tolerance	Exponential moving averaged improvement tolerance for convergence.
number_of_iterations	Maximum number of iterations; set to 1 to simulate online learning.
initial_learning_rate	Initial learning rate (only used by SGD).
shuffle	Shuffle data every epoch?.
positive_instance_weight	Apply weight to the positive class, for imbalanced data.
check_frequency	Convergence check frequency (in terms of number of iterations). Default equals number of threads.
params	Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # SgdBinaryClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.categorical import OneHotVectorizer
   from nimbusml.linear_model import SgdBinaryClassifier

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path)
   print(data.head())
   #    age  case education  induced  parity ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       SgdBinaryClassifier(feature=['parity', 'edu'], label='case')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel  Probability     Score
   # 0               0     0.363427 -0.560521
   # 1               0     0.378848 -0.494439
   # 2               0     0.363427 -0.560521
   # 3               0     0.369564 -0.534088
   # 4               0     0.336350 -0.679603
   # print evaluation metrics
   print(metrics)
   #        AUC  Accuracy  Positive precision  Positive recall  ...
   # 0  0.497006  0.665323                   0                0  ...

Remarks

The Stochastic Gradient Descent (SGD) is one of the most popular stochastic optimization procedure that can be integrated into several machine learning tasks to achieve state-of-the-art performance. The Hogwild SGD binary classification learner implements SGD for binary classification that supports multi-threading without any locking. If the associated optimization problem is sparse, then Hogwild SGD achieves a nearly optimal rate of convergence. For a detailed reference, please refer to https://arxiv.org/pdf/1106.5730v2.pdf.

Reference

https://arxiv.org/pdf/1106.5730v2.pdf

Methods

decision_function	Returns score values
get_params	Get the parameters for this operator.
predict_proba	Returns probabilities

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name	Description
deep	Default value: False

predict_proba

Returns probabilities

predict_proba(X, **params)

Bagikan melalui

SgdBinaryClassifier Class

Constructor

Parameters

Examples

Remarks

Methods

decision_function

get_params

Parameters

predict_proba

Sumber Daya Tambahan: