LogisticRegressionBinaryClassifier Class
Machine Learning Logistic Regression
- Inheritance
-
nimbusml.internal.core.linear_model._logisticregressionbinaryclassifier.LogisticRegressionBinaryClassifierLogisticRegressionBinaryClassifiernimbusml.base_predictor.BasePredictorLogisticRegressionBinaryClassifiersklearn.base.ClassifierMixinLogisticRegressionBinaryClassifier
Constructor
LogisticRegressionBinaryClassifier(normalize='Auto', caching='Auto', show_training_statistics=False, l2_regularization=1.0, l1_regularization=1.0, optimization_tolerance=1e-07, history_size=20, enforce_non_negativity=False, initial_weights_diameter=0.0, maximum_number_of_iterations=2147483647, stochastic_gradient_descent_initilaization_tolerance=0.0, quiet=False, use_threads=True, number_of_threads=None, dense_optimizer=False, feature=None, label=None, weight=None, **params)
Parameters
- feature
see Columns.
- label
see Columns.
- weight
see Columns.
- normalize
If Auto
, the choice to normalize depends on the
preference declared by the algorithm. This is the default choice. If
No
, no normalization is performed. If Yes
, normalization always
performed. If Warn
, if normalization is needed by the algorithm, a
warning message is displayed but normalization is not performed. If
normalization is performed, a MaxMin
normalizer is used. This
normalizer preserves sparsity by mapping zero to zero.
- caching
Whether trainer should cache input training data.
- show_training_statistics
Show statistics of training examples.
- l2_regularization
L2 regularization weight.
- l1_regularization
L1 regularization weight.
- optimization_tolerance
Tolerance parameter for optimization convergence. Low = slower, more accurate.
- history_size
Memory size for L-BFGS. Lower=faster, less accurate.
The technique used for optimization here is L-BFGS, which uses only a
limited amount of memory to compute the next step direction. This
parameter indicates the number of past positions and gradients to store
for the computation of the next step. Must be greater than or equal to
1
.
- enforce_non_negativity
Enforce non-negative weights. This flag, however, does not put any constraint on the bias term; that is, the bias term can be still a negtaive number.
- initial_weights_diameter
Sets the initial weights diameter that
specifies the range from which values are drawn for the initial
weights. These weights are initialized randomly from within this range.
For example, if the diameter is specified to be d
, then the weights
are uniformly distributed between -d/2
and d/2
. The default
value is 0
, which specifies that all the weights are set to zero.
- maximum_number_of_iterations
Maximum iterations.
- stochastic_gradient_descent_initilaization_tolerance
Run SGD to initialize LR weights, converging to this tolerance.
- quiet
If set to true, produce no output during training.
- use_threads
Whether or not to use threads. Default is true.
- number_of_threads
Number of threads.
- dense_optimizer
If True
, forces densification of the internal
optimization vectors. If False
, enables the logistic regression
optimizer use sparse or dense internal states as it finds appropriate.
Setting denseOptimizer
to True
requires the internal optimizer
to use a dense internal state, which may help alleviate load on the
garbage collector for some varieties of larger problems.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# LogisticRegressionBinaryClassifier
from nimbusml import Pipeline, FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.feature_extraction.categorical import OneHotVectorizer
from nimbusml.linear_model import LogisticRegressionBinaryClassifier
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path)
print(data.head())
# age case education induced parity ... row_num spontaneous ...
# 0 26 1 0-5yrs 1 6 ... 1 2 ...
# 1 42 1 0-5yrs 1 1 ... 2 0 ...
# 2 39 1 0-5yrs 2 6 ... 3 0 ...
# 3 34 1 0-5yrs 2 4 ... 4 0 ...
# 4 35 1 6-11yrs 1 3 ... 5 1 ...
# define the training pipeline
pipeline = Pipeline([
OneHotVectorizer(columns={'edu': 'education'}),
LogisticRegressionBinaryClassifier(feature=['parity', 'edu'], label='case')
])
# train, predict, and evaluate
metrics, predictions = pipeline.fit(data).test(data, output_scores=True)
# print predictions
print(predictions.head())
# PredictedLabel Probability Score
# 0 0 0.334679 -0.687098
# 1 0 0.334679 -0.687098
# 2 0 0.334679 -0.687098
# 3 0 0.334679 -0.687098
# 4 0 0.334679 -0.687098
# print evaluation metrics
print(metrics)
# AUC Accuracy Positive precision Positive recall Negative precision ...
# 0 0.5 0.665323 0 0 0.665323 ...
Remarks
Logistic Regression is a classification method used to predict the value of a categorical dependent variable from its relationship to one or more independent variables assumed to have a logistic distribution. If the dependent variable has only two possible values (success/failure), then the logistic regression is binary. If the dependent variable has more than two possible values (blood type given diagnostic test results), then the logistic regression is multinomial.
The optimization technique used for
LogisticRegressionBinaryClassifier
is the limited memory
Broyden-Fletcher-Goldfarb-Shanno (L-BFGS). Both the L-BFGS and
regular
BFGS algorithms use quasi-Newtonian methods to estimate the
computationally intensive Hessian matrix in the equation used by
Newton's method to calculate steps. But the L-BFGS approximation uses
only a limited amount of memory to compute the next step direction,
so
that it is especially suited for problems with a large number of
variables. The memory_size
parameter specifies the number of past
positions and gradients to store for use in the computation of the
next
step.
This learner can use elastic net regularization: a linear combination of L1 (lasso) and L2 (ridge) regularizations. Regularization is a method that can render an ill-posed problem more tractable by imposing constraints that provide information to supplement the data and that prevents overfitting by penalizing models with extreme coefficient values. This can improve the generalization of the model learned by selecting the optimal complexity in the bias-variance tradeoff. Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less. L1 and L2 regularization have different effects and uses that are complementary in certain respects.
l1_weight
: can be applied to sparse models, when working with high-dimensional data. It pulls small weights associated
features that are relatively unimportant towards 0.
l2_weight
: is preferable for data that is not sparse. It pulls large weights towards zero.
Adding the ridge penalty to the regularization overcomes some of
lasso's
limitations. It can improve its predictive accuracy, for example,
when
the number of predictors is greater than the sample size. If x = l1_weight
and y = l2_weight
, ax + by = c
defines the linear
span of the regularization terms. The default values of x and y are
both
1
. An agressive regularization can harm predictive capacity by
excluding important variables out of the model. So choosing the
optimal
values for the regularization parameters is important for the
performance of the logistic regression model.
Reference
Wikipedia: Logistic regression
Scalable Training of L1-Regularized Log-Linear Models
Test Run - L1 and L2 Regularization for Machine Learning
Methods
decision_function |
Returns score values |
get_params |
Get the parameters for this operator. |
predict_proba |
Returns probabilities |
decision_function
Returns score values
decision_function(X, **params)
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep
predict_proba
Returns probabilities
predict_proba(X, **params)