Enforcing YARN queue usage
Important
This content is being retired and may not be updated in the future. The support for Machine Learning Server will end on July 1, 2022. For more information, see What's happening to Machine Learning Server?
R Server tasks running on Spark or MapReduce can be managed through use of YARN job queues. To direct a job to a specific queue, the end user must include the queue name in the MapReduce or Spark Compute Context.
MapReduce
Use the "hadoopSwitches" option to direct jobs to a specific YARN queue.
RxHadoopMR(.., hadoopSwitches='-Dmapreduce.job.queuename=mrsjobs')
Spark
Use the "extraSparkConfig" option to direct jobs to a specific YARN queue.
RxSpark(.., extraSparkConfig='--conf spark.yarn.queue=mrsjobs')
RxSparkConnect(..,
extraSparkConfig='--conf spark.yarn.queue=mrsjobs')
Overrides
Use of a specific queue can be enforced by the Hadoop system administrator by providing an installation override to the RxSpark()
, RxSparkConnect()
, and RxHadoopMR()
compute context functions. A benefit is that you no longer have to explicitly specify the queue.
This procedure involves creating a custom R package that contains the function overrides, installing that package on the nodes in use by end users, and adding the package to the default search path on these nodes. The following code block provides an example. If you use this code as a template, remember to change the ‘mrsjobs’ YARN queue name to the queue name that's valid for your system.
Create a new R package, such as “abcMods” if your company abbreviation is ‘abc’, by installing the “devtools” package and running the following command, or use the
package.skeleton()
function in base R. To learn more about creating R packages, see Writing R Extensions on the CRAN website, or online version of R Packages from Hadley Wickham.> library(devtools) > create('/dev/abcMods',rstudio=FALSE)
This creates the essential package files in the requested directory, in this case ‘/dev/abcMods’. Edit each of the following to fill in the relevant info.
DESCRIPTION – text file containing the description of the R package:
Package: abcMods Date: 2016-09-01 Title: Company ABC Modified Functions Depends: RevoScaleR Description: R Server functions modified for use by Company ABC License: file LICENSE
NAMESPACE – text file containing the list of overridden functions to be exported:
export("RxHadoopMR", "RxSpark", “RxSparkConnect”)
LICENSE – create a text file named ‘LICENSE’ in the package directory with a single line for the license associated with the R package:
This package is for internal Company ABC use only -- not for redistribution.
In the package’s R directory add one or more
*.R
files with the code for the functions to be overridden. The following sample code provides for overridingRxHadoopMR
,RxSpark
, andRxSparkConnect
that you might save to a file called "ccOverrides.r" in that directory. TheRxSparkConnect
function is only available in V9 and later releases.# sample code to enforce use of YARN queues for RxHadoopMR, RxSpark, # and RxSparkConnect RxHadoopMR <- function(...) { dotargs <- list(...) hswitch <- dotargs$hadoopSwitches # remove any queue info that may already present y <- hswitch if (isTRUE(grep('queue',hswitch) > 0)) { hswitch <- gsub(' *= *','=',hswitch,perl=TRUE) hswitch <- gsub(' +',' ',hswitch) x <- unlist(strsplit(hswitch," ")) i <- grep('queue',x) y <- paste(x[- c(i-1,i)],collapse=' ') } # add in the required queue info dotargs$hadoopSwitches <- paste(y,'-Dmapreduce.job.queuename=mrsjobs') do.call( RevoScaleR::RxHadoopMR, dotargs ) } RxSpark <- function(...) { dotargs <- list(...) hswitch <- dotargs$extraSparkConfig # remove any queue info that may already present y <- hswitch if (isTRUE(grep('queue',hswitch) > 0)) { hswitch <- gsub(' *= *','=',hswitch,perl=TRUE) hswitch <- gsub(' +',' ',hswitch) x <- unlist(strsplit(hswitch," ")) i <- grep('queue',x) y <- paste(x[- c(i-1,i)],collapse=' ') } # add in the required queue info dotargs$extraSparkConfig <- paste(y,'--conf spark.yarn.queue=mrsjobs') do.call( RevoScaleR::RxSpark, dotargs ) } } RxSparkConnect <- function(...) { dotargs <- list(...) hswitch <- dotargs$extraSparkConfig # remove any queue info that may already present y <- hswitch if (isTRUE(grep('queue',hswitch) > 0)) { hswitch <- gsub(' *= *','=',hswitch,perl=TRUE) hswitch <- gsub(' +',' ',hswitch) x <- unlist(strsplit(hswitch," ")) i <- grep('queue',x) y <- paste(x[- c(i-1,i)],collapse=' ') } # add in the required queue info dotargs$extraSparkConfig <- paste(y,'--conf spark.yarn.queue=mrsjobs') do.call( RevoScaleR::RxSparkConnect, dotargs ) }
After editing the previous components of the package, run the following Linux commands to build the package from the directory containing the abcMods directory:
R CMD build abcMods R CMD INSTALL abcmods_0.1-0 (if needed run this using sudo)
If you need to build the package for users on Windows, then the equivalent commands would be as follows where ‘rpath’ is defined to point to the version of R Server to which you’d like to add the library.
set rpath="C:\Program Files\Microsoft\R Client\R_SERVER\bin\x64\R.exe" %rpath% CMD build abcMods %rpath% CMD INSTALL abcMods_0.1-0.tar.gz
To test it, start R, load the library, and make a call to
RxHadoopMR()
:> library(abcMods) > RxHadoopMR(hadoopSwitches="-Dmapreduce.job.queuename=XYZ")
You should see the result come back with the queue name set to your override value (for example,
Dmapreduce.job.queuename=mrsjobs)
.To automate the loading of the package so that users don’t need to specify "library(abcMods)", edit Rprofile.site and modify the line specifying the default packages to include abcMods as the last item:
options(defaultPackages=c(getOption("defaultPackages"), "rpart", "lattice", "RevoScaleR", "RevoMods", "RevoUtils", "RevoUtilsMath", "abcMods"))
Once everything tests out to your satisfaction, install the package on all the edge nodes that your users are logging in to. To do this, copy "Rprofile.site" and R’s library/abcMods directory to each of these nodes, or install the package from the abcmods_0.1-0 tar file on each node and manually edit the "Rprofile.site" file on each node.