LbfgsLogisticRegressionBinaryTrainer Class

Reference

Definition

Namespace:: Microsoft.ML.Trainers

Assembly:: Microsoft.ML.StandardTrainers.dll

Package:: Microsoft.ML v3.0.1

Package:: Microsoft.ML v1.0.0

Package:: Microsoft.ML v1.1.0

Package:: Microsoft.ML v1.2.0

Package:: Microsoft.ML v1.3.1

Package:: Microsoft.ML v1.4.0

Package:: Microsoft.ML v1.5.5

Package:: Microsoft.ML v1.6.0

Package:: Microsoft.ML v1.7.0

Package:: Microsoft.ML v2.0.0

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

The IEstimator<TTransformer> to predict a target using a linear logistic regression model trained with L-BFGS method.

public sealed class LbfgsLogisticRegressionBinaryTrainer : Microsoft.ML.Trainers.LbfgsTrainerBase<Microsoft.ML.Trainers.LbfgsLogisticRegressionBinaryTrainer.Options,Microsoft.ML.Data.BinaryPredictionTransformer<Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LinearBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>,Microsoft.ML.Calibrators.CalibratedModelParametersBase<Microsoft.ML.Trainers.LinearBinaryModelParameters,Microsoft.ML.Calibrators.PlattCalibrator>>

type LbfgsLogisticRegressionBinaryTrainer = class
    inherit LbfgsTrainerBase<LbfgsLogisticRegressionBinaryTrainer.Options, BinaryPredictionTransformer<CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>, CalibratedModelParametersBase<LinearBinaryModelParameters, PlattCalibrator>>

Public NotInheritable Class LbfgsLogisticRegressionBinaryTrainer
Inherits LbfgsTrainerBase(Of LbfgsLogisticRegressionBinaryTrainer.Options, BinaryPredictionTransformer(Of CalibratedModelParametersBase(Of LinearBinaryModelParameters, PlattCalibrator)), CalibratedModelParametersBase(Of LinearBinaryModelParameters, PlattCalibrator))

Inheritance: Object

TrainerEstimatorBase<TTransformer,TModel>

LbfgsTrainerBase<LbfgsLogisticRegressionBinaryTrainer.Options,BinaryPredictionTransformer<CalibratedModelParametersBase<LinearBinaryModelParameters,PlattCalibrator>>,CalibratedModelParametersBase<LinearBinaryModelParameters,PlattCalibrator>>
LbfgsLogisticRegressionBinaryTrainer

Remarks

To create this trainer, use LbfgsLogisticRegression or LbfgsLogisticRegression(Options).

Input and Output Columns

The input label column data must be Boolean. The input features column data must be a known-sized vector of Single.

This trainer outputs the following columns:

Output Column Name	Column Type	Description
`Score`	Single	The unbounded score that was calculated by the model.
`PredictedLabel`	Boolean	The predicted label, based on the sign of the score. A negative score maps to `false` and a positive score maps to `true`.
`Probability`	Single	The probability calculated by calibrating the score of having true as the label. Probability value is in range [0, 1].

Trainer Characteristics


Machine learning task	Binary classification
Is normalization required?	Yes
Is caching required?	No
Required NuGet in addition to Microsoft.ML	None
Exportable to ONNX	Yes

Scoring Function

Linear logistic regression is a variant of linear model. It maps feature vector $\textbf{x} \in {\mathbb R}^n$ to a scalar via $\hat{y}\left( \textbf{x} \right) = \textbf{w}^T \textbf{x} + b = \sum_{j=1}^n w_j x_j + b$, where the $x_j$ is the $j$-th feature's value, the $j$-th element of $\textbf{w}$ is the $j$-th feature's coefficient, and $b$ is a learnable bias. The corresponding probability of getting a true label is $\frac{1}{1 + e^{\hat{y}\left( \textbf{x} \right)}}$.

Training Algorithm Details

The optimization technique implemented is based on the limited memory Broyden-Fletcher-Goldfarb-Shanno method (L-BFGS). L-BFGS is a quasi-Newtonian method which replaces the expensive computation cost of the Hessian matrix with an approximation but still enjoys a fast convergence rate like the Newton method where the full Hessian matrix is computed. Since L-BFGS approximation uses only a limited amount of historical states to compute the next step direction, it is especially suited for problems with high-dimensional feature vector. The number of historical states is a user-specified parameter, using a larger number may lead to a better approximation to the Hessian matrix but also a higher computation cost per step.

Regularization is a method that can render an ill-posed problem more tractable by imposing constraints that provide information to supplement the data and that prevents overfitting by penalizing model's magnitude usually measured by some norm functions. This can improve the generalization of the model learned by selecting the optimal complexity in the bias-variance tradeoff. Regularization works by adding the penalty that is associated with coefficient values to the error of the hypothesis. An accurate model with extreme coefficient values would be penalized more, but a less accurate model with more conservative values would be penalized less.

This learner supports elastic net regularization: a linear combination of L1-norm (LASSO), $|| \textbf{w} ||_1$, and L2-norm (ridge), $|| \textbf{w} ||_2^2$ regularizations. L1-norm and L2-norm regularizations have different effects and uses that are complementary in certain respects. Using L1-norm can increase sparsity of the trained $\textbf{w}$. When working with high-dimensional data, it shrinks small weights of irrelevant features to 0 and therefore no resource will be spent on those bad features when making predictions. If L1-norm regularization is used, the training algorithm is OWL-QN. L2-norm regularization is preferable for data that is not sparse and it largely penalizes the existence of large weights.

An aggressive regularization (that is, assigning large coefficients to L1-norm or L2-norm regularization terms) can harm predictive capacity by excluding important variables out of the model. Therefore, choosing the right regularization coefficients is important when applying logistic regression.

Check the See Also section for links to usage examples.

Fields

FeatureColumn	The feature column that the trainer expects. (Inherited from TrainerEstimatorBase<TTransformer,TModel>)
LabelColumn	The label column that the trainer expects. Can be `null`, which indicates that label is not used for training. (Inherited from TrainerEstimatorBase<TTransformer,TModel>)
WeightColumn	The weight column that the trainer expects. Can be `null`, which indicates that weight is not used for training. (Inherited from TrainerEstimatorBase<TTransformer,TModel>)

Properties

Info	(Inherited from LbfgsTrainerBase<TOptions,TTransformer,TModel>)

Methods

Fit(IDataView, LinearModelParameters)	Continues the training of a LbfgsLogisticRegressionBinaryTrainer using an already trained `modelParameters` and returns a BinaryPredictionTransformer<TModel>.
Fit(IDataView)	Trains and returns a ITransformer. (Inherited from TrainerEstimatorBase<TTransformer,TModel>)
GetOutputSchema(SchemaShape)	(Inherited from TrainerEstimatorBase<TTransformer,TModel>)

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

Dela via