rxOptions: Global Options for RevoScaleR

Article
07/12/2022

Description

Functions to specify and retrieve options needed for RevoScaleR computations. These need to be set only once to carry out multiple computations.

Usage

  rxOptions(initialize = FALSE,
            libDir,
            linkDllName = ifelse(.Platform$OS.type == "windows", "RxLink.dll", "libRxLink.so.2"),
            cintSysDir = setCintSysDir(),
            includeDir = setRxIncludeDir(),
            unitTestDir = system.file("unitTests", package = "RevoScaleR"),
            unitTestDataDir = system.file("unitTestData", package = "RevoScaleR"),
            sampleDataDir = system.file("SampleData", package = "RevoScaleR"),
            demoScriptsDir = system.file("demoScripts", package = "RevoScaleR"),
            blocksPerRead = 1,
            reportProgress = 2,
            rowDisplayMax = -1,
            memStatsReset = 0,
            memStatsDiff = 0,
            numCoresToUse = 4,
            numDigits = options()$digits, 
            showTransformFn = FALSE,
            defaultDecimalColType = "float32",
            defaultMissingColType = "float32",
            computeContext = RxLocalSeq(),
            dataPath = ".",
            outDataPath = ".",
            transformPackages = c("RevoScaleR","utils","stats","methods"),
            xdfCompressionLevel = 1,
            fileSystem = "native",
            useDoSMP = NULL,
            useSparseCube = FALSE,
            rngBufferSize = 1,
            dropMain = TRUE,          
            coefLabelStyle = "Revo",
            numTasks = 1,
            hdfsHost = Sys.getenv("REVOHADOOPHOST"),
            hdfsPort = as.integer(Sys.getenv("REVOHADOOPPORT")),
            unixRPath = "/usr/bin/Revo64-8",
            mrsHadoopPath = "/usr/bin/mrs-hadoop",
            spark.executorCores = 2,
            spark.executorMem = "4g",
            spark.executorOverheadMem = "4g",
            spark.numExecutors = 65535,
            traceLevel = 0,
            ...)

  rxGetOption(opt, default = NULL)
  rxIsExpressEdition()

Arguments

`initialize`

logical value. If TRUE, rxOptions resets all RevoScaleR options to their default value.

`libDir`

character string specifying path to RevoScaleR's lib directory. For 32-bit versions, this defaults to the libs directory; for 64-bit versions, this defaults to the libs/x64 directory.

`linkDllName`

character string specifying name of the RevoScaleR's DLL (Windows) or shared object (Linux).

`cintSysDir`

character string specifying path to RevoScaleR's C/C++ interpreter (CINT) directory.

`includeDir`

character string specifying path to RevoScaleR's include directory.

`unitTestDir`

character string specifying path to RevoScaleR's RUnit-based test directory.

`unitTestDataDir`

character string specifying path to RevoScaleR's RUnit-based test data directory.

`sampleDataDir`

character string specifying path to RevoScaleR's sample data directory.

`demoScriptsDir`

character string specifying path to RevoScaleR's demo script directory.

`blocksPerRead`

default value to use for blocksPerRead argument for many RevoScaleR functions. Represents the number of blocks to read within each read chunk.

`reportProgress`

default value to use for reportProgress argument for many RevoScaleR functions. Options are:

0: no progress is reported.
1: the number of processed rows is printed and updated.
2: rows processed and timings are reported.
3: rows processed and all timings are reported.

`rowDisplayMax`

scalar integer specifying the maximum number of rows to display when using the verbose argument in RevoScaleR functions. The default of -1 displays all available rows.

`memStatsReset`

boolean integer. If 1, reset memory status

`memStatsDiff`

boolean integer. If 1, the change of memory status is shown.

`numCoresToUse`

scalar integer specifying the number of cores to use. If set to a value higher than the number of available cores, the number of available cores will be used. If set to -1, the number of available cores will be used. Increasing the number of cores to use will also increase the amount of memory required for RevoScaleR analysis functions.

`numDigits`

controls the number of digits to to use when converting numeric data to or from strings, such as when printing numeric values or importing numeric data as strings. The default is the current value of options()$digits, which defaults to 7. Beyond fifteen digits, however, results are likely to be unreliable.

`showTransformFn`

logical value. If TRUE, the transform function is shown.

`defaultDecimalColType`

Used to specify a column's data type when only decimal values (possibly mixed with missing (NA) values) are encountered upon first read of the data and the column's type information is not specified via colInfo or colClasses. Supported types are "float32" and "numeric", for 32-bit floating point and 64-bit floating point values, respectively.

`defaultMissingColType`

Used to specify a given column's data type when only missings (NAs) or blanks are encountered upon first read of the data and the column's type information is not specified via colInfo or colClasses. Supported types are "float32", "numeric", and "character" for 32-bit floating point, 64-bit floating point and string values, respectively.

`computeContext`

an RxComputeContext object representing the computational environment.

RxLocalSeq: compute locally, using sequential processing with rxExec High Performance Computing.
RxLocalParallel: compute locally, using the 'parallel' package for processing with rxExec High Performance Computing.
RxForeachDoPar: use the currently registered parallel backend for 'foreach' for processing with rxExec High Performance Computing.
RxHadoopMR: use a Hadoop cluster for both High Performance Analytics for rxExec High Performance Computing.
RxSpark: use a Spark cluster for both High Performance Analytics and for rxExec High Performance Computing.

`dataPath`

character vector containing paths to search for local data sources. The default is to search just the current working directory. This will be ignored if dataPath is specified in the active compute context. See the Details section for more information regarding the path format.

`outDataPath`

character vector containing paths for writing new output data files. New data files will be written to the first path that exists. The default is to write to the current working directory. This will be ignored if outDataPath is specified in the active compute context.

`transformPackages`

character vector defining default set of R packages to be made available and preloaded for use in variable transformation functions.

`xdfCompressionLevel`

integer in the range of -1 to 9. The higher the value, the greater the amount of compression - resulting in smaller files but a longer time to create them. If xdfCompressionLevel is set to 0, there will be no compression and files will be compatible with the 6.0 release of Revolution R Enterprise. If set to -1, a default level of compression will be used.

`fileSystem`

character string or RxFileSystem object indicating type of file system; "native" or RxNativeFileSystem object can be used for the local operating system, or an RxHdfsFileSystem object for the Hadoop file system.

`useDoSMP`

NULL. Deprecated. Use a RxLocalParallel compute context.

`opt`

character string specifying the RevoScaleR option to obtain. A NULL is returned if the option does not exist.

`useSparseCube`

logical value. If TRUE, sparse cube is used.

`rngBufferSize`

a positive integer scalar specifying the buffer size for the Parallel Random Number Generators (RNGs) in MKL.

`dropMain`

logical value. If TRUE, main-effect terms are dropped before their interactions.

`coefLabelStyle`

character string specifying the coefficient label style. The default is "Revo". If "R", R-compatible labels are created.

`numTasks`

integer value. The default numTasks use in RxInSqlServer.

`hdfsHost`

character string specifying the host name of your Hadoop nameNode. Defaults to Sys.getenv("REVOHADOOPHOST"), or "default" if no REVOHADOOPHOST environment variable is set.

`hdfsPort`

integer scalar specifying the port number of your Hadoop nameNode, or a character string that can be coerced to numeric. Defaults to as.integer(Sys.getenv("REVOHADOOPPORT")), or 0 if no REVOHADOOPPORT environment variable is set.

`unixRPath`

The path to R executable on a Unix/Linux node. By default it points to a path corresponding to this client's version.

`mrsHadoopPath`

Points to entry point to Hadoop MR which is deployed on every cluster node when MRS for Hadoop is installed. This script implements logic that determines which hadoop command should be called.

`traceLevel`

Specifies the traceLevel that MRS will run with. This parameter controls MRS Logging features as well as Runtime Tracing of ScaleR functions. Levels are inclusive, (i.e. level 3:INFO includes levels 2:WARN and 1:ERROR log messages). The options are:

0: DISABLED - Tracing/Logging disabled.
1: ERROR- ERROR coded trace points are logged to MRS log files
2: WARN- WARN and ERROR coded trace points are logged to MRS log files.
3: INFO- INFO, WARN, and ERROR coded trace points are logged to MRS log files.
4: DEBUG- All trace points are logged to MRS log files.
5: RESERVED - If set, will log at DEBUG granularity
6: RESERVED - If set, will log at DEBUG granularity
7: TRACE- ScaleR functions Runtime Tracing is activated and MRS log level is set to DEBUG granularity.

`...`

additional arguments to be passed through.

`default`

default value for an option that is returned if option is not found

Details

A full set of RevoScaleR options is set on load. Use rxGetOption to obtain the value of a single option.

The dataPath argument is a character vector of well formed directory paths, either in UNC ("\\host\dir") or DOS ("C:\dir"). When specifying the paths, you must double the number of backslashes since R requires that backslashes be escaped. Updating the previous examples gives "\\\\host\\dir" for UNC-type paths and "C:\\dir" for DOS-type paths. Alternatively, you could specify a DOS-type path with single forward slashes such as the output from system.file (e.g. "C:/Revolution/R-Enterprise-Node-7.4/R-3.1.3/library/RevoScaleR/SampleData"). For Windows operating systems, the arguments that define a path must be in long format and not DOS 8.3 (short) format, e.g., ""C:\\Program Files\\RRO\\R-3.1.3\\bin\\x64" or "C:/Program Files/RRO/R-3.1.3/bin/x64" are correct formats while "C:/PROGRA~1/RRO/R-31~1.3/bin/x64" is not.

rxIsExpressEdition is not currently functional.

Value

For rxOptions, a list containing the original rxOptions is returned. If there is no argument specified, the list is returned explicitly, otherwise the list is returned as an invisible object. For rxGetOption, the current value of the requested option is returned.

Author(s)

Microsoft Corporation Microsoft Technical Support

Examples


 # Get the location of the sample data sets for RevoScaleR
 dataDir <- rxGetOption("sampleDataDir")
 # See the current settings for options
 rxOptions() 
 ## Not run:

rxOptions(reportProgress = 0) # by default, don't report progress
rxOptions()$reportProgress # show value
rxOptions(TRUE) # reset all options
rxOptions()$reportProgress # 2

# Setup to run analyses on HPC cluster
myCluster <- RxSpark(nameNode = "my-name-service-server", port = 8020)

rxOptions( computeContext = myCluster )
## End(Not run)

Share via

rxOptions: Global Options for RevoScaleR

Description

Usage

Arguments

initialize

libDir

linkDllName

cintSysDir

includeDir

unitTestDir

unitTestDataDir

sampleDataDir

demoScriptsDir

blocksPerRead

reportProgress

rowDisplayMax

memStatsReset

memStatsDiff

numCoresToUse

numDigits

showTransformFn

defaultDecimalColType

defaultMissingColType

computeContext

dataPath

outDataPath

transformPackages

xdfCompressionLevel

fileSystem

useDoSMP

opt

useSparseCube

rngBufferSize

dropMain

coefLabelStyle

numTasks

hdfsHost

hdfsPort

unixRPath

mrsHadoopPath

traceLevel

...

default