rxHistogram: Histogram
Description
Histogram plot for a variable in an .xdf file or data frame
Usage
rxHistogram(formula, data, pweights = NULL, fweights = NULL, numBreaks = NULL,
startVal = NULL, endVal = NULL, levelsToDrop = NULL,
levelsToKeep = NULL, rowSelection = NULL, transforms = NULL,
transformObjects = NULL, transformFunc = NULL, transformVars = NULL,
transformPackages = NULL, transformEnvir = NULL,
blocksPerRead = rxGetOption("blocksPerRead"),
histType = "Counts",
title = NULL, subtitle = NULL, xTitle = NULL, yTitle = NULL,
xNumTicks = NULL, yNumTicks = NULL, xAxisMinMax = NULL,
yAxisMinMax = NULL, fillColor = "cyan", lineColor = "black",
lineStyle = "solid", lineWidth = 1, plotAreaColor = "gray90",
gridColor = "white", gridLineWidth = 1, gridLineStyle = "solid",
maxNumPanels = 100, reportProgress = rxGetOption("reportProgress"),
print = TRUE, ...)
Arguments
formula
formula describing the data to plot. It should take the form of ~x|g1 + g2
where g1
and g2
are optional conditioning factor variables and x is the name of a variable or an on-the-fly factorization F(x). Other expressions of x are not supported.
data
either an RxXdfData object, a character string specifying the .xdf file, or a data frame containing the variable to plot.
pweights
character string specifying the variable to use as probability weights for the observations.
fweights
character string specifying the variable to use as frequency weights for the observations.
numBreaks
number of breaks to use to cut numeric data, including the upper and lower bounds.
startVal
low value used for cutting numeric data.
endVal
high value used for cutting numeric data.
levelsToDrop
levels to exclude if the histogram variable is a factor.
levelsToKeep
levels to keep if the histogram variable is a factor.
rowSelection
name of a logical variable in the data set (in quotes) or a logical expression using variables in the data set to specify row selection. For example, rowSelection = "old"
will use only observations in which the value of the variable old
is TRUE
. rowSelection = (age > 20) & (age < 65) & (log(income) > 10)
will use only observations in which the value of the age
variable is between 20 and 65 and the value of the log
of the income
variable is greater than 10. The row selection is performed after processing any data transformations (see the arguments transforms
or transformFunc
). As with all expressions, rowSelection
can be defined outside of the function call using the expression function.
transforms
an expression of the form list(name = expression, ...)
representing the first round of variable transformations. As with all expressions, transforms
(or rowSelection
) can be defined outside of the function call using the expression function.
transformObjects
a named list containing objects that can be referenced by transforms
, transformsFunc
, and rowSelection
.
transformFunc
variable transformation function. See rxTransform for details.
transformVars
character vector of input data set variables needed for the transformation function. See rxTransform for details.
transformPackages
character vector defining additional R packages (outside of those specified in rxGetOption("transformPackages")
) to be made available and preloaded for use in variable transformation functions, e.g., those explicitly defined in RevoScaleR functions via their transforms
and transformFunc
arguments or those defined implicitly via their formula
or rowSelection
arguments. The transformPackages
argument may also be NULL
, indicating that no packages outside rxGetOption("transformPackages")
will be preloaded.
transformEnvir
user-defined environment to serve as a parent to all environments developed internally and used for variable data transformation. If transformEnvir = NULL
, a new "hash" environment with parent baseenv()
is used instead.
blocksPerRead
number of blocks to read for each chunk of data read from the data source.
histType
character string specifying "Counts"
or "Percent"
.
title
main title for the plot. Alternatively main
can be used.
subtitle
subtitle (at the bottom) for the plot. Alternatively sub
can be used.
xTitle
title for the X axis. Alternatively xlab
can be used.
yTitle
title for the Y axis. Alternatively ylab
can be used.
xNumTicks
number of tick marks on X axis (ignored for factor variables).
yNumTicks
number of tick marks on Y axis.
xAxisMinMax
numeric vector of length 2 containing a minimum and maximum value for the X axis. Alternatively xlim
can be used.
yAxisMinMax
numeric vector of length 2 containing a minimum and maximum value for the Y axis. Alternatively ylim
can be used.
fillColor
fill color for histogram. Use colors to see color names.
lineColor
line color for border of histogram.
lineStyle
line style for border of histogram: "blank", "solid", "dashed", ``"dotted", "dotdash", "longdash",
or "twodash"
.
lineWidth
line width for border of histogram. Alternatively lwd
can be used.
plotAreaColor
background color for the plot area.
gridColor
color for grid lines.
gridLineWidth
line width for grid lines.
gridLineStyle
line style for grid lines.
maxNumPanels
integer specifying the maximum number of panels to plot. The number of panels is determined by the product of the number of levels of each conditioning variable. If the number of panels exceeds the maxNumPanels an error is given and the plot is not drawn. If maxNumPanels is NULL, it is ignored.
reportProgress
integer value with options:
0
: no progress is reported.1
: the number of processed rows is printed and updated.2
: rows processed and timings are reported.3
: rows processed and all timings are reported.
print
logical. If TRUE
, the plot is printed. If FALSE
, and the lattice package is loaded, an lattice plot object is returned invisibly and can be printed later.
...
additional arguments to be passed directly to the underlying barchart
or xyplot
function.
Details
rxHistogram
calls rxCube to perform computations and uses
the lattice graphics package (barchart or
xyplot) to create the plot. The rxHistogram
function will attempt bin continuous data in reasonable intervals. For
faster computation (using a bin for every integer value), use
the F() function around the variable. Descriptive argument names
are used to facilitate quick and easy plotting and self-documenting code
for new R users.
Value
An object of class "trellis". It is automatically printed within the function.
Author(s)
Microsoft Corporation Microsoft Technical Support
See Also
rxLinePlot, rxCube, histogram.
Examples
# Examples using airline data
airlineData <- file.path(rxGetOption("sampleDataDir"), "AirlineDemoSmall.xdf")
# Use the F() function to quickly compute bins for each integer level
rxHistogram(~F(CRSDepTime), data = airlineData)
# Specify the approximate number of breaks
rxHistogram(~CRSDepTime, numBreaks=11, data = airlineData)
# Examples using census data subsample
censusWorkers <- file.path(rxGetOption("sampleDataDir"), "CensusWorkers")
# Create panels for each of the 3 states
rxHistogram(~ sex | state, data = censusWorkers)
# Repeat, printing x axis labels at an angle, and all panels in a row
rxHistogram(~ sex | state, scales = list(x = list(rot = 30)),
data = censusWorkers, layout = c(3,1))
# Create panels for age for each sex for each state
rxHistogram(~ age | sex + state, data = censusWorkers)
# Specify how wage income should be broken into bins
rxHistogram(~ incwage | state + sex, title="Wage Income Up To 100,000",
endVal = 100000, numBreaks=21, data = censusWorkers)
# Show panels for each state on a separate page
numCols <- 1
numRows <- 2
## Not run:
par(ask=TRUE) # Set ask to pause between each plot
## End(Not run)
rxHistogram(~ age | sex + state, data = censusWorkers, layout=c(numCols, numRows))
# Create a jpeg file for each page, named myplot001.jpeg, etc
## Not run:
jpeg(file="myplot
rxHistogram(~ age | sex + state, data = censusWorkers,
blocksPerRead=6, layout=c(numCols, numRows))
dev.off()
## End(Not run)