SymSgdBinaryClassifier Class
Train an symbolic SGD model.
- Inheritance
-
nimbusml.internal.core.linear_model._symsgdbinaryclassifier.SymSgdBinaryClassifierSymSgdBinaryClassifiernimbusml.base_predictor.BasePredictorSymSgdBinaryClassifiersklearn.base.ClassifierMixinSymSgdBinaryClassifier
Constructor
SymSgdBinaryClassifier(normalize='Auto', caching='Auto', number_of_iterations=50, learning_rate=None, l2_regularization=0.0, number_of_threads=None, tolerance=0.0001, update_frequency=None, memory_size=1024, shuffle=True, positive_instance_weight=1.0, feature=None, label=None, **params)
Parameters
- feature
see Columns.
- label
see Columns.
- normalize
Specifies the type of automatic normalization used:
"Auto"
: if normalization is needed, it is performed automatically. This is the default choice."No"
: no normalization is performed."Yes"
: normalization is performed."Warn"
: if normalization is needed, a warning message is displayed, but normalization is not performed.
Normalization rescales disparate data ranges to a standard scale.
Feature
scaling insures the distances between data points are proportional
and
enables various optimization methods such as gradient descent to
converge
much faster. If normalization is performed, a MaxMin
normalizer
is
used. It normalizes values in an interval [a, b] where -1 <= a <= 0
and 0 <= b <= 1
and b - a = 1
. This normalizer preserves
sparsity by mapping zero to zero.
- caching
Whether trainer should cache input training data.
- number_of_iterations
Number of passes over the data.
- learning_rate
Determines the size of the step taken in the direction of the gradient in each step of the learning process. This determines how fast or slow the learner converges on the optimal solution. If the step size is too big, you might overshoot the optimal solution. If the step size is too small, training takes longer to converge to the best solution.
- l2_regularization
L2 regularization.
- number_of_threads
Degree of lock-free parallelism. Determinism not guaranteed. Multi-threading is not supported currently.
- tolerance
Tolerance for difference in average loss in consecutive passes.
- update_frequency
The number of iterations each thread learns a local model until combining it with the global model. Low value means more updated global model and high value means less cache traffic.
- memory_size
Memory size for L-BFGS. Lower=faster, less accurate.
The technique used for optimization here is L-BFGS, which uses only a
limited amount of memory to compute the next step direction. This
parameter indicates the number of past positions and gradients to store
for the computation of the next step. Must be greater than or equal to
1
.
- shuffle
Shuffle data?.
- positive_instance_weight
Apply weight to the positive class, for imbalanced data.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# SymSgdBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.linear_model import SymSgdBinaryClassifier
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
SymSgdBinaryClassifier(feature=['induced', 'edu'], label='case')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 1 1.0 263.630310
# 1 1 1.0 263.630310
# 2 1 1.0 305.514282
# 3 1 1.0 305.514282
# 4 1 1.0 33.698135
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall ...
# 0 0.504783 0.479839 0.364706 0.746988 ...
Remarks
Stochastic gradient descent (SGD) is a well known method for
regression and classification
tasks, and is primarily a sequential algorithm. The
SymSgdBinaryClassifier
is an
implementation of a parallel SGD algorithm that, to a first-order
approximation, retains the
sequential semantics of SGD. Each thread learns a local model as well
a model combiner
which allows local models to be combined to to produce what a
sequential model would have
produced.
Reference
Parallel Stochastic Gradient Descent with Sound Combiners
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep
predict_proba
Returns probabilities
predict_proba(X, **params)