SymSgdBinaryClassifier Class

Train an symbolic SGD model.

Inheritance
nimbusml.internal.core.linear_model._symsgdbinaryclassifier.SymSgdBinaryClassifier
SymSgdBinaryClassifier
nimbusml.base_predictor.BasePredictor
SymSgdBinaryClassifier
sklearn.base.ClassifierMixin
SymSgdBinaryClassifier

Constructor

SymSgdBinaryClassifier(normalize='Auto', caching='Auto', number_of_iterations=50, learning_rate=None, l2_regularization=0.0, number_of_threads=None, tolerance=0.0001, update_frequency=None, memory_size=1024, shuffle=True, positive_instance_weight=1.0, feature=None, label=None, **params)

Parameters

feature

see Columns.

label

see Columns.

normalize

Specifies the type of automatic normalization used:

  • "Auto": if normalization is needed, it is performed automatically. This is the default choice.

  • "No": no normalization is performed.

  • "Yes": normalization is performed.

  • "Warn": if normalization is needed, a warning message is displayed, but normalization is not performed.

Normalization rescales disparate data ranges to a standard scale. Feature scaling insures the distances between data points are proportional and enables various optimization methods such as gradient descent to converge much faster. If normalization is performed, a MaxMin normalizer is used. It normalizes values in an interval [a, b] where -1 <= a <= 0 and 0 <= b <= 1 and b - a = 1. This normalizer preserves sparsity by mapping zero to zero.

caching

Whether trainer should cache input training data.

number_of_iterations

Number of passes over the data.

learning_rate

Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.

l2_regularization

L2 regularization.

number_of_threads

Degree of lock-free parallelism. Determinism not guaranteed. Multi-threading is not supported currently.

tolerance

Tolerance for difference in average loss in consecutive passes.

update_frequency

The number of iterations each thread learns a local model until combining it with the global model. Low value means more updated global model and high value means less cache traffic.

memory_size

Memory size for L-BFGS. Lower=faster, less accurate. The technique used for optimization here is L-BFGS, which uses only a limited amount of memory to compute the next step direction. This parameter indicates the number of past positions and gradients to store for the computation of the next step. Must be greater than or equal to 1.

shuffle

Shuffle data?.

positive_instance_weight

Apply weight to the positive class, for imbalanced data.

params

Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # SymSgdBinaryClassifier
   from nimbusml import Pipeline, FileDataStream
   from nimbusml.datasets import get_dataset
   from nimbusml.feature_extraction.categorical import OneHotVectorizer
   from nimbusml.linear_model import SymSgdBinaryClassifier

   # data input (as a FileDataStream)
   path = get_dataset('infert').as_filepath()

   data = FileDataStream.read_csv(path)
   print(data.head())
   #    age  case education  induced  parity ... row_num  spontaneous  ...
   # 0   26     1    0-5yrs        1       6 ...       1            2  ...
   # 1   42     1    0-5yrs        1       1 ...       2            0  ...
   # 2   39     1    0-5yrs        2       6 ...       3            0  ...
   # 3   34     1    0-5yrs        2       4 ...       4            0  ...
   # 4   35     1   6-11yrs        1       3 ...       5            1  ...

   # define the training pipeline
   pipeline = Pipeline([
       OneHotVectorizer(columns={'edu': 'education'}),
       SymSgdBinaryClassifier(feature=['induced', 'edu'], label='case')
   ])

   # train, predict, and evaluate
   metrics, predictions = pipeline.fit(data).test(data, output_scores=True)

   # print predictions
   print(predictions.head())
   #   PredictedLabel  Probability       Score
   # 0               1          1.0  263.630310
   # 1               1          1.0  263.630310
   # 2               1          1.0  305.514282
   # 3               1          1.0  305.514282
   # 4               1          1.0   33.698135
   # print evaluation metrics
   print(metrics)
   #        AUC  Accuracy  Positive precision  Positive recall  ...
   # 0  0.504783  0.479839            0.364706         0.746988  ...

Remarks

Stochastic gradient descent (SGD) is a well known method for regression and classification tasks, and is primarily a sequential algorithm. The SymSgdBinaryClassifier is an implementation of a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD. Each thread learns a local model as well a model combiner which allows local models to be combined to to produce what a sequential model would have produced.

Reference

Parallel Stochastic Gradient Descent with Sound Combiners

Methods

decision_function

Returns score values

get_params

Get the parameters for this operator.

predict_proba

Returns probabilities

decision_function

Returns score values

decision_function(X, **params)

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

deep
default value: False

predict_proba

Returns probabilities

predict_proba(X, **params)