summary.mlModel: Summary of a Microsoft R Machine Learning model.
Summary of a Microsoft R Machine Learning model.
Usage
## S3 method for class `mlModel':
summary (object, top = 20, ...)
Arguments
object
A model object returned from a MicrosoftML analysis.
top
Specifies the count of top coefficients to show in the summary for linear models such as rxLogisticRegression and rxFastLinear. The bias appears first, followed by other weights, sorted by their absolute values in descending order. If set to NULL
, all non-zero coefficients are shown. Otherwise, only the first top
coefficients are shown.
...
Additional arguments to be passed to the summary method.
Details
Provides summary information about the original function call, the
data set used to train the model, and statistics for coefficients in the
model.
Value
The summary
method of the MicrosoftML analysis objects
returns a list that includes the original function call and the underlying
parameters used. The coef
method returns a named vector of weights,
processing information from the model object.
For rxLogisticRegression, the following statistics may also
present in the summary when showTrainingStats
is set to TRUE
.
training.size
The size, in terms of row count, of the data set used to train the model.
deviance
The model deviance is given by -2 * ln(L)
where L
is the likelihood of obtaining the observations with all features incorporated in the model.
null.deviance
The null deviance is given by -2 * ln(L0)
where L0
is the likelihood of obtaining the observations with no effect from the features. The null model includes the bias if there is one in the model.
aic
The AIC (Akaike Information Criterion) is defined as 2 * k ``+ deviance
, where k
is the number of coefficients of the model. The bias counts as one of the coefficients. The AIC is a measure of the relative quality of the model. It deals with the trade-off between the goodness of fit of the model (measured by deviance) and the complexity of the model (measured by number of coefficients).
coefficients.stats
This is a data frame containing the statistics for each coefficient in the model. For each coefficient, the following statistics are shown. The bias appears in the first row, and the remaining coefficients in the ascending order of p-value.
- EstimateThe estimated coefficient value of the model.
- Std ErrorThis is the square root of the large-sample variance of the estimate of the coefficient.
- z-ScoreWe can test against the null hypothesis, which states that the coefficient should be zero, concerning the significance of the coefficient by calculating the ratio of its estimate and its standard error. Under the null hypothesis, if there is no regularization applied, the estimate of the concerning coefficient follows a normal distribution with mean 0 and a standard deviation equal to the standard error computed above. The z-score outputs the ratio between the estimate of a coefficient and the standard error of the coefficient.
- Pr(>|z|) This is the corresponding p-value for the two-sided test of the z-score. Based on the significance level, a significance indicator is appended to the p-value. If
F(x)
is the CDF of the standard normal distributionN(0, 1)
, thenP(>|z|) = 2 - ``2 * F(|z|)
.
Author(s)
Microsoft Corporation Microsoft Technical Support
See also
rxFastTrees, rxFastForest, rxFastLinear, rxOneClassSvm, rxNeuralNet, rxLogisticRegression.
Examples
# Estimate a logistic regression model
logitModel <- rxLogisticRegression(isCase ~ age + parity + education + spontaneous + induced,
transforms = list(isCase = case == 1),
data = infert)
# Print a summary of the model
summary(logitModel)
# Score to a data frame
scoreDF <- rxPredict(logitModel, data = infert,
extraVarsToWrite = "isCase")
# Compute and plot the Radio Operator Curve and AUC
roc1 <- rxRoc(actualVarName = "isCase", predVarNames = "Probability", data = scoreDF)
plot(roc1)
rxAuc(roc1)
#######################################################################################
# Multi-class logistic regression
testObs <- rnorm(nrow(iris)) > 0
testIris <- iris[testObs,]
trainIris <- iris[!testObs,]
multiLogit <- rxLogisticRegression(
formula = Species~Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
type = "multiClass", data = trainIris)
# Score the model
scoreMultiDF <- rxPredict(multiLogit, data = testIris,
extraVarsToWrite = "Species")
# Print the first rows of the data frame with scores
head(scoreMultiDF)
# Look at confusion matrix
table(scoreMultiDF$Species, scoreMultiDF$PredictedLabel)
# Look at the observations with incorrect predictions
badPrediction = scoreMultiDF$Species != scoreMultiDF$PredictedLabel
scoreMultiDF[badPrediction,]