Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Description
Write .xdf file content to a delimited text file. rxDataStep
recommended.
Usage
rxXdfToText(inFile, outFile, varsToKeep = NULL,
varsToDrop = NULL, rowSelection = NULL,
transforms = NULL, transformObjects = NULL,
transformFunc = NULL, transformVars = NULL,
transformPackages = NULL, transformEnvir = NULL,
overwrite = FALSE, sep = ",", quote = TRUE, na = "NA",
eol = "\n", col.names = TRUE,
blocksPerRead = rxGetOption("blocksPerRead"),
reportProgress = rxGetOption("reportProgress"), ...)
Arguments
inFile
either an RxXdfData object or a character string specifying the input .xdf file.
outFile
character string specifying the output delimited text file.
varsToKeep
character vector of variable names to include when reading from inFile
. If NULL
, argument is ignored. Cannot be used with varsToDrop
.
varsToDrop
character vector of variable names to exclude when reading from inFile
. If NULL
, argument is ignored. Cannot be used with varsToKeep
.
rowSelection
name of a logical variable in the data set (in quotes) or a logical expression using variables in the data set to specify row selection. For example, rowSelection = "old"
will use only observations in which the value of the variable old
is TRUE
. rowSelection = (age > 20) & (age < 65) & (log(income) > 10)
will use only observations in which the value of the age
variable is between 20 and 65 and the value of the log
of the income
variable is greater than 10. The row selection is performed after processing any data transformations (see the arguments transforms
or transformFunc
). As with all expressions, rowSelection
can be defined outside of the function call using the expression function.
transforms
an expression of the form list(name = expression, ...)
representing the first round of variable transformations. As with all expressions, transforms
(or rowSelection
) can be defined outside of the function call using the expression function.
transformObjects
a named list containing objects that can be referenced by transforms
, transformsFunc
, and rowSelection
.
transformFunc
variable transformation function. See rxTransform for details.
transformVars
character vector of input data set variables needed for the transformation function. See rxTransform for details.
transformPackages
character vector specifying additional R packages (outside of those specified in rxGetOption("transformPackages")
) to be made available and preloaded for use in variable transformation functions, both those explicitly defined in RevoScaleR functions via their transforms
and transformFunc
arguments and those defined implicitly via their formula
or rowSelection
arguments. The transformPackages
argument may also be NULL
, indicating that no packages outside rxGetOption("transformPackages")
will be preloaded.
transformEnvir
user-defined environment to serve as a parent to all environments developed internally and used for variable data transformation. If transformEnvir = NULL
, a new "hash" environment with parent baseenv()
is used instead.
overwrite
logical value. If TRUE
, the existing outFile
will be overwritten.
sep
character(s) specifying the separator between columns.
quote
logical value or numeric vector. If TRUE
, any character or factor columns will be surrounded by double quotes. If a numeric vector, its elements are taken as the indices of columns to quote. In both cases, row and column names are quoted if they are written. If FALSE
, nothing is quoted.
na
character string to use for missing values in the data.
eol
character(s) to print at the end of each line (row). For example, eol = "\r\n"
will produce Windows' line endings on a Unix-alike OS, and eol = "\r"
will produce files as expected by Mac OS Excel 2004.
col.names
a logical value indicating whether the column names in the inFile
should be written as the first row in the output text file.
blocksPerRead
number of blocks to read for each chunk of data read from the data source.
reportProgress
integer value with options:
0
: no progress is reported.1
: the number of processed rows is printed and updated.2
: rows processed and timings are reported.3
: rows processed and all timings are reported.
...
additional arguments passed to the write.table function.
Details
For most purposes, the more general rxDataStep is preferred for writing to text files.
Value
An RxTextData object representing the output text file.
Author(s)
Microsoft Corporation Microsoft Technical Support
See Also
rxDataStep, rxImport, write.table, rxSplit.
Examples
sampleDataDir <- rxGetOption("sampleDataDir")
censusXdfFile <- file.path(rxGetOption("sampleDataDir"), "CensusWorkers.xdf")
# Write all data in a text file
censusCsvFile <- file.path(tempdir(), "CensusWorkers.csv")
rxXdfToText(inFile = censusXdfFile, outFile = censusCsvFile, overwrite = TRUE)
# Alternatively, use rxDataStep
rxDataStep(inData = censusXdfFile, outFile = censusCsvFile, overwrite = TRUE)
# Write subset of transformed data in a text file
censusSubsetCsvFile <- file.path(tempdir(), "CensusSubset.csv")
rxXdfToText(inFile = censusXdfFile, outFile = censusSubsetCsvFile,
varsToKeep = c("age", "incwage"),
rowSelection = age < 40,
transforms = list(logIncwage = log1p(incwage), incwage = NULL),
overwrite = TRUE)
# Again, this can be done using rxDataStep
rxDataStep(inData = censusXdfFile, outFile = censusSubsetCsvFile,
varsToKeep = c("age", "incwage"),
rowSelection = age < 40,
transforms = list(logIncwage = log1p(incwage), incwage = NULL),
overwrite = TRUE)
# Clean-up
file.remove(censusCsvFile)
file.remove(censusSubsetCsvFile)