Share via


rxStepControl: Control for Stepwise Regression

Description

Various parameters that control aspects of stepwise regression.

Usage

  rxStepControl(method = "stepwise", scope = NULL, 
      maxSteps = 1000, stepCriterion = "AIC",
      maxSigLevelToAdd = NULL, minSigLevelToDrop = NULL,
      refitEachStep = NULL, keepStepCoefs = FALSE,
      scale = 0, k = 2, test = NULL,   ...  )

Arguments

method

a character string specifying the method of stepwise search:

  • "stepwise": bi-directional search.
  • "backward": backward elimination.
  • "forward": forward selection.
    Default is "stepwise" if the scope argument is not missing, otherwise "backward".

scope

either a single formula, or a named list containing components upper and lower, both formulae, defining the range of models to be examined in the stepwise search.

maxSteps

an integer specifying the maximum number of steps to be considered, typically used to stop the process early and the default is 1000.

stepCriterion

a character string specifying the variable selection criterion:

  • "AIC": Akaike's information criterion.
  • "SigLevel": significance level, the traditional stepwise approach in SAS. This argument is similar to the SELECT option of the GLMSELECT procedure in SAS. Default is "AIC".

maxSigLevelToAdd

a numeric scalar specifying the significance level for adding a variable to the model. This argument is used only when stepCriterion = "SigLevel" and is similar to the SLENTRY option of the GLMSELECT procedure in SAS. The defaults are 0.50 for "forward" and 0.15 for "stepwise".

minSigLevelToDrop

a numeric scalar specifying the significance level for dropping a variable from the model. This argument is used only when stepCriterion = "SigLevel" and is similar to the SLSTAY option of the GLMSELECT procedure in SAS. The defaults are 0.10 for "backward" and 0.15 for "stepwise".

refitEachStep

a logical flag specifying whether or not to refit the model at each step. The default is NULL, indicating to refit the model at each step for rxLogit and rxGlm but not for rxLinMod.

keepStepCoefs

a logical flag specifying whether or not to keep the model coefficients at each step. If TRUE, a data.frame stepCoefs will be returned with the fitted model with rows corresponding to the coefficients and columns corresponding to the iterations. Additional computation may be required to generate the coefficients at each step. Those stepwise coefficients can be visualized by plotting the fitted model with rxStepPlot.

scale

optional numeric scalar specifying the scale parameter of the model. It is used in computing the AIC statistics for selecting the models. The default 0 indicates it should be estimated by maximum likelihood. See "scale" in step for details.

k

optional numeric scalar specifying the weight of the number of equivalent degrees of freedom in computing AIC for the penalty. See "k" in step for details.

test

a character string specifying the test statistic to be included in the results, either "F" or "Chisq". Both test statistics are relative to the original model.

...

additional arguments to be passed directly to the Microsoft R Services Compute Engine.

Details

Stepwise models must be computed on the same dataset in order to be compared so rows with missing values in any of the variables in the upper model are removed before the model fitting starts. Consequently, the stepwise models might be different from the corresponding models fitted with only the selected variables if there are missing values in the data set.

When computing stepwise models with rxLogit or rxGlm, you can sometimes improve the speed and quality of the fitting by setting returnAlways=TRUE in the initial rxLogit or rxGlm call. When returnAlways=TRUE, rxLogit and rxGlm always return the solution tried so far that has the minimum deviance.

Value

A list containing the options.

Author(s)

Microsoft Corporation Microsoft Technical Support

References

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. New York: Springer (4th ed).

Chambers, J. M. and Hastie, T. J. eds (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

Goodnight, J. H. (1979) A Tutorial on the SWEEP Operator. The American Statistician Vol. 33 No. 3, 149--158.

See Also

step, rxStepPlot.

Examples


 ## setup
 form <- Sepal.Length ~ Sepal.Width + Petal.Length
 scope <- list(
     lower = ~ Sepal.Width,
     upper = ~ Sepal.Width + Petal.Length + Petal.Width * Species)

 ## lm/step
 ## We need to specify the contrasts for the factor variable Species,
 ## even though this is not part of the original model. This will 
 ## generate a warning, so we suppress that warning here.
 suppressWarnings(rlm.obj <- lm(form, data = iris, contrasts = list(Species = contr.SAS)))
 rlm.step <- step(rlm.obj, direction = "both", scope = scope, trace = 1)

 ## rxLinMod/variableSelection
 varsel <- rxStepControl(method = "stepwise", scope = scope)
 rxlm.step <- rxLinMod(form, data = iris, variableSelection = varsel,
     verbose = 1, dropMain = FALSE, coefLabelStyle = "R")

 ## compare lm/step and rxLinMod/variableSelection
 rlm.step$anova
 rxlm.step$anova

 as.matrix(coef(rlm.step))
 as.matrix(coef(rxlm.step))

 ## rxLinMod/variableSelection with keepStepCoefs = TRUE
 varsel <- rxStepControl(method = "stepwise", scope = scope, keepStepCoefs = TRUE)
 rxlm.step <- rxLinMod(form, data = iris, variableSelection = varsel,
     verbose = 1, dropMain = FALSE, coefLabelStyle = "R")
 rxStepPlot(rxlm.step)