rxGetJobResults: Obtain Distributed Computing Job Status and Results

Description

Obtain distributed computing results and processing status.

Usage

  rxGetJobResults(jobInfo, consoleOutput = NULL, autoCleanup = NULL)
  rxGetJobStatus(jobInfo)

Arguments

jobInfo

a jobInfo object as returned by rxExec or a RevoScaleRanalysis function, for example, the rxgLastPendingJob object, if available.

consoleOutput

NULL or logical value. If TRUE, the console output from all of the processes is printed to the user console. If FALSE, no console output is displayed. Output can be retrieved with the function rxGetJobOutput for a non-waiting job. If not NULL, this flag overrides the value set in the compute context when the job was submitted. If NULL, the setting in the compute context will be used.

autoCleanup

NULL or logical value. If TRUE, the default behavior is to clean up any artifacts created by the distributed computing job. If FALSE, then the artifacts are not deleted, and the results may be acquired using rxGetJobResults, and the console output via rxGetJobOutput until the rxCleanupJobs is used to delete the artifacts. If not NULL, this flag overwrites the value set in the compute context when the job was submitted. If you routinely set autoCleanup=FALSE, you will eventually fill your hard disk with compute artifacts.If you set autoCleanup=TRUEand experience performance degradation on a Windows XP client, consider setting autoCleanup=FALSE.

Details

The possible job status strings returned by rxGetJobStatus are:

"undetermined"

information no longer retained by job scheduler. May still retrieve results and output when available.

"finished"

job has successfully completed.

"failed"

job has failed.

"canceled"

job has been canceled and the cancel process is completed.

"missing"

job is no longer available on the cluster.

"running"

job is either running or completing it's run (finishing, flushing buffers, etc.), or it is in the process of being canceled.

"queued"

job is in one of the following states: it is being configured, has been submitted but has not started, is being validated, or is in the job queue.

On LSF clusters, job information by default is held for only one hour (although this is configurable using the LSF parameter CLEAN_PERIOD); jobs older than the CLEAN_PERIOD setting will have status "undetermined".

Value

rxGetJobResults: either the results of the run (prepended with console output if the consoleOutput argument is set to TRUE) or a message saying that the results are not available because the job has not finished, has failed, or was deleted.

rxGetJobStatus: a character string denoting the job status.

See Also

RxSpark, RxHadoopMR, RxInSqlServer, rxGetJobs, rxCleanupJobs, rxGetJobStatus, rxExec.

Examples


 ## Not run:

# set up the cluster object
myCluster <- RxSpark(nameNode = "my-name-service-server", port = 8020, wait = FALSE)
rxOptions(computeContext=myCluster)

func <- function(name)
{
   print("hi")
   cat(name)
   name
}

jobInfo <- rxExec(func, "dog", elemType = "cores", timesToRun = 5)
rxGetJobStatus(jobInfo)
rxGetJobResults(jobInfo)
## End(Not run)