rxOptions: Global Options for RevoScaleR
Description
Functions to specify and retrieve options needed for RevoScaleR computations. These need to be set only once to carry out multiple computations.
Usage
rxOptions(initialize = FALSE,
libDir,
linkDllName = ifelse(.Platform$OS.type == "windows", "RxLink.dll", "libRxLink.so.2"),
cintSysDir = setCintSysDir(),
includeDir = setRxIncludeDir(),
unitTestDir = system.file("unitTests", package = "RevoScaleR"),
unitTestDataDir = system.file("unitTestData", package = "RevoScaleR"),
sampleDataDir = system.file("SampleData", package = "RevoScaleR"),
demoScriptsDir = system.file("demoScripts", package = "RevoScaleR"),
blocksPerRead = 1,
reportProgress = 2,
rowDisplayMax = -1,
memStatsReset = 0,
memStatsDiff = 0,
numCoresToUse = 4,
numDigits = options()$digits,
showTransformFn = FALSE,
defaultDecimalColType = "float32",
defaultMissingColType = "float32",
computeContext = RxLocalSeq(),
dataPath = ".",
outDataPath = ".",
transformPackages = c("RevoScaleR","utils","stats","methods"),
xdfCompressionLevel = 1,
fileSystem = "native",
useDoSMP = NULL,
useSparseCube = FALSE,
rngBufferSize = 1,
dropMain = TRUE,
coefLabelStyle = "Revo",
numTasks = 1,
hdfsHost = Sys.getenv("REVOHADOOPHOST"),
hdfsPort = as.integer(Sys.getenv("REVOHADOOPPORT")),
unixRPath = "/usr/bin/Revo64-8",
mrsHadoopPath = "/usr/bin/mrs-hadoop",
spark.executorCores = 2,
spark.executorMem = "4g",
spark.executorOverheadMem = "4g",
spark.numExecutors = 65535,
traceLevel = 0,
...)
rxGetOption(opt, default = NULL)
rxIsExpressEdition()
Arguments
initialize
logical value. If TRUE
, rxOptions
resets all RevoScaleR options to their default value.
libDir
character string specifying path to RevoScaleR's lib directory. For 32-bit versions, this defaults to the libs
directory; for 64-bit versions, this defaults to the libs/x64
directory.
linkDllName
character string specifying name of the RevoScaleR's DLL (Windows) or shared object (Linux).
cintSysDir
character string specifying path to RevoScaleR's C/C++ interpreter (CINT) directory.
includeDir
character string specifying path to RevoScaleR's include directory.
unitTestDir
character string specifying path to RevoScaleR's RUnit-based test directory.
unitTestDataDir
character string specifying path to RevoScaleR's RUnit-based test data directory.
sampleDataDir
character string specifying path to RevoScaleR's sample data directory.
demoScriptsDir
character string specifying path to RevoScaleR's demo script directory.
blocksPerRead
default value to use for blocksPerRead
argument for many RevoScaleR functions. Represents the number of blocks to read within each read chunk.
reportProgress
default value to use for reportProgress
argument for many RevoScaleR functions. Options are:
0
: no progress is reported.1
: the number of processed rows is printed and updated.2
: rows processed and timings are reported.3
: rows processed and all timings are reported.
rowDisplayMax
scalar integer specifying the maximum number of rows to display when using the verbose
argument in RevoScaleR functions. The default of -1
displays all available rows.
memStatsReset
boolean integer. If 1
, reset memory status
memStatsDiff
boolean integer. If 1
, the change of memory status is shown.
numCoresToUse
scalar integer specifying the number of cores to use. If set to a value higher than the number of available cores, the number of available cores will be used. If set to -1
, the number of available cores will be used. Increasing the number of cores to use will also increase the amount of memory required for RevoScaleR analysis functions.
numDigits
controls the number of digits to to use when converting numeric data to or from strings, such as when printing numeric values or importing numeric data as strings. The default is the current value of options()$digits
, which defaults to 7. Beyond fifteen digits, however, results are likely to be unreliable.
showTransformFn
logical value. If TRUE
, the transform function is shown.
defaultDecimalColType
Used to specify a column's data type when only decimal values (possibly mixed with missing (NA
) values) are encountered upon first read of the data and the column's type information is not specified via colInfo
or colClasses
. Supported types are "float32" and "numeric", for 32-bit floating point and 64-bit floating point values, respectively.
defaultMissingColType
Used to specify a given column's data type when only missings (NA
s) or blanks are encountered upon first read of the data and the column's type information is not specified via colInfo
or colClasses
. Supported types are "float32", "numeric", and "character" for 32-bit floating point, 64-bit floating point and string values, respectively.
computeContext
an RxComputeContext object representing the computational environment.
- RxLocalSeq: compute locally, using sequential processing with rxExec High Performance Computing.
- RxLocalParallel: compute locally, using the
'parallel'
package for processing with rxExec High Performance Computing. - RxForeachDoPar: use the currently registered parallel backend for 'foreach' for processing with rxExec High Performance Computing.
- RxHadoopMR: use a Hadoop cluster for both High Performance Analytics for rxExec High Performance Computing.
- RxSpark: use a Spark cluster for both High Performance Analytics and for rxExec High Performance Computing.
dataPath
character vector containing paths to search for local data sources. The default is to search just the current working directory. This will be ignored if dataPath
is specified in the active compute context. See the Details section for more information regarding the path format.
outDataPath
character vector containing paths for writing new output data files. New data files will be written to the first path that exists. The default is to write to the current working directory. This will be ignored if outDataPath
is specified in the active compute context.
transformPackages
character vector defining default set of R packages to be made available and preloaded for use in variable transformation functions.
xdfCompressionLevel
integer in the range of -1 to 9. The higher the value, the greater the amount of compression - resulting in smaller files but a longer time to create them. If xdfCompressionLevel
is set to 0, there will be no compression and files will be compatible with the 6.0 release of Revolution R Enterprise. If set to -1, a default level of compression will be used.
fileSystem
character string or RxFileSystem object indicating type of file system; "native"
or RxNativeFileSystem
object can be used for the local operating system, or an RxHdfsFileSystem
object for the Hadoop file system.
useDoSMP
NULL
. Deprecated. Use a RxLocalParallel compute context.
opt
character string specifying the RevoScaleR option to obtain. A NULL
is returned if the option does not exist.
useSparseCube
logical value. If TRUE
, sparse cube is used.
rngBufferSize
a positive integer scalar specifying the buffer size for the Parallel Random Number Generators (RNGs) in MKL.
dropMain
logical value. If TRUE
, main-effect terms are dropped before their interactions.
coefLabelStyle
character string specifying the coefficient label style. The default is "Revo". If "R", R-compatible labels are created.
numTasks
integer value. The default numTasks
use in RxInSqlServer.
hdfsHost
character string specifying the host name of your Hadoop nameNode. Defaults to Sys.getenv("REVOHADOOPHOST"), or "default"
if no REVOHADOOPHOST environment variable is set.
hdfsPort
integer scalar specifying the port number of your Hadoop nameNode, or a character string that can be coerced to numeric. Defaults to as.integer(Sys.getenv("REVOHADOOPPORT"))
, or 0
if no REVOHADOOPPORT environment variable is set.
unixRPath
The path to R executable on a Unix/Linux node. By default it points to a path corresponding to this client's version.
mrsHadoopPath
Points to entry point to Hadoop MR which is deployed on every cluster node when MRS for Hadoop is installed. This script implements logic that determines which hadoop command should be called.
traceLevel
Specifies the traceLevel that MRS will run with. This parameter controls MRS Logging features as well as Runtime Tracing of ScaleR functions. Levels are inclusive, (i.e. level 3:INFO
includes levels 2:WARN
and 1:ERROR
log messages). The options are:
0
:DISABLED
- Tracing/Logging disabled.1
:ERROR
-ERROR
coded trace points are logged to MRS log files2
:WARN
-WARN
andERROR
coded trace points are logged to MRS log files.3
:INFO
-INFO
,WARN
, andERROR
coded trace points are logged to MRS log files.4
:DEBUG
- All trace points are logged to MRS log files.5
:RESERVED
- If set, will log atDEBUG
granularity6
:RESERVED
- If set, will log atDEBUG
granularity7
:TRACE
- ScaleR functions Runtime Tracing is activated and MRS log level is set toDEBUG
granularity.
...
additional arguments to be passed through.
default
default value for an option that is returned if option is not found
Details
A full set of RevoScaleR options is set on load. Use rxGetOption
to obtain the value of a single option.
The dataPath
argument is a character vector of well formed directory paths, either in UNC ("\\host\dir
") or DOS ("C:\dir
").
When specifying the paths, you must double the number of backslashes since R requires that backslashes be escaped. Updating the previous
examples gives "\\\\host\\dir
" for UNC-type paths and "C:\\dir
" for DOS-type paths. Alternatively, you could specify a
DOS-type path with single forward slashes such as the output from system.file
(e.g. "C:/Revolution/R-Enterprise-Node-7.4/R-3.1.3/library/RevoScaleR/SampleData
").
For Windows operating systems, the arguments that define a path must be in long format and not DOS 8.3 (short) format, e.g.,
""C:\\Program Files\\RRO\\R-3.1.3\\bin\\x64"
or "C:/Program Files/RRO/R-3.1.3/bin/x64
" are correct formats
while "C:/PROGRA~1/RRO/R-31~1.3/bin/x64"
is not.
rxIsExpressEdition
is not currently functional.
Value
For rxOptions
, a list containing the original rxOptions is returned. If
there is no argument specified, the list is returned explicitly, otherwise the
list is returned as an invisible object. For rxGetOption
, the current
value of the requested option is returned.
Author(s)
Microsoft Corporation Microsoft Technical Support
See Also
RevoScaleR, RxLocalSeq, RxLocalParallel, RxForeachDoPar, RxHadoopMR, RxSpark, RxInSqlServer.
Examples
# Get the location of the sample data sets for RevoScaleR
dataDir <- rxGetOption("sampleDataDir")
# See the current settings for options
rxOptions()
## Not run:
rxOptions(reportProgress = 0) # by default, don't report progress
rxOptions()$reportProgress # show value
rxOptions(TRUE) # reset all options
rxOptions()$reportProgress # 2
# Setup to run analyses on HPC cluster
myCluster <- RxSpark(nameNode = "my-name-service-server", port = 8020)
rxOptions( computeContext = myCluster )
## End(Not run)