rxPredict.mlModel：使用 Microsoft R 机器学习评分

项目
05/23/2023

结合使用经过训练的 Microsoft R 机器学习模型和 RevoScaleR 数据源，在数据帧或 RevoScaleR 数据源中报告每个实例的评分结果。

用法

 ## S3 method for class `mlModel':
rxPredict  (modelObject, data, outData = NULL,
    writeModelVars = FALSE, extraVarsToWrite = NULL, suffix = NULL,
    overwrite = FALSE, dataThreads = NULL,
    blocksPerRead = rxGetOption("blocksPerRead"),
    reportProgress = rxGetOption("reportProgress"), verbose = 1,
    computeContext = rxGetOption("computeContext"), ...)

参数

`modelObject`

从 MicrosoftML 模型返回的模型信息对象。例如，从 rxFastTrees 或 rxLogisticRegression 返回的对象。

`data`

RevoScaleR 数据源对象、数据帧或指向 .xdf 文件的路径。

`outData`

输出文本或 xdf 文件名或其中具有写入功能的 RxDataSource，用于存储预测值。如果为 NULL，则返回数据帧。默认值是 NULL。

`writeModelVars`

如果为 TRUE，则除了评分变量外，还会将模型中的变量写入到输出数据集。如果输入数据集中的变量在模型中进行了转换，则转换后的变量也会包括在内。默认值是 FALSE。

`extraVarsToWrite`

要包括在 outData 中的 NULL 或输入数据中的其他变量名的字符向量。如果 writeModelVars 为 TRUE，则还包括模型变量。默认值是 NULL。

`suffix`

一个字符串，指定要追加到创建的评分变量的后缀，或指定 NULL 表示没有后缀。默认值是 NULL。

`overwrite`

如果为 TRUE，则覆盖现有的 outData；如果为 FALSE，则不覆盖现有的 outData。默认值是 FALSE。

`dataThreads`

一个整数，指定数据管道中所需的并行度。如果为 NULL，则使用的线程数在内部确定。默认值是 NULL。

`blocksPerRead`

为从数据源读取的每个数据块指定要读取的块数。

`reportProgress`

一个整数值，指定行处理进度的报告级别：

0：不报告进度。
1：打印并更新已处理的行数。
2：报告已处理的行数和计时。
3：报告已处理的行数和所有计时。
默认值是 1。

`verbose`

一个整数值，指定需要的输出量。如果为 0，则计算期间不会打印详细输出。从 1 到 4 的整数值表示提供的信息量逐步增加。默认值是 1。

`computeContext`

设置执行计算的上下文，使用有效的 RxComputeContext 指定。目前支持本地和 RxInSqlServer 计算上下文。

`...`

要直接传递到 Microsoft 计算引擎的其他参数。

详细信息

默认情况下，输出中报告以下项：针对二元分类器的三个变量的评分：PredictedLabel、Score 和 Probability；Score 用于 oneClassSvm 和回归分类器；PredictedLabel 用于多类分类器，以及 Score 前面预置的每个类别的变量。

值

表示创建的输出数据的数据帧或 RxDataSource 对象。默认情况下，评分二元分类器的输出包含三个变量：PredictedLabel、Score 和 Probability；rxOneClassSvm 和回归包含一个变量：Score；多类分类器包含 PredictedLabel 加上 Score 前面预置的每个类别的变量。如果提供了 suffix，则会将其添加到这些输出变量名称的末尾。

作者

Microsoft Corporation Microsoft Technical Support

另请参阅

rxFastTrees、rxFastForest、rxLogisticRegression、rxNeuralNet、rxOneClassSvm。

示例



 # Estimate a logistic regression model
 infert1 <- infert
 infert1$isCase <- (infert1$case == 1)
 myModelInfo <- rxLogisticRegression(formula = isCase ~ age + parity + education + spontaneous + induced,
                        data = infert1)

 # Create an xdf file with per-instance results using rxPredict
 xdfOut <- tempfile(pattern = "scoreOut", fileext = ".xdf")
 scoreDS <- rxPredict(myModelInfo, data = infert1,
     outData = xdfOut, overwrite = TRUE,
     extraVarsToWrite = c("isCase", "Probability"))

 # Summarize results with an ROC curve
 rxRocCurve(actualVarName = "isCase", predVarNames = "Probability", data = scoreDS)

 # Use the built-in data set 'airquality' to create test and train data
 DF <- airquality[!is.na(airquality$Ozone), ]  
 DF$Ozone <- as.numeric(DF$Ozone)
 set.seed(12)
 randomSplit <- rnorm(nrow(DF))
 trainAir <- DF[randomSplit >= 0,]
 testAir <- DF[randomSplit < 0,]
 airFormula <- Ozone ~ Solar.R + Wind + Temp

 # Regression Fast Tree for train data
 fastTreeReg <- rxFastTrees(airFormula, type = "regression", 
     data = trainAir)  

 # Put score and model variables in data frame, including the model variables
 # Add the suffix "Pred" to the new variable
 fastTreeScoreDF <- rxPredict(fastTreeReg, data = testAir, 
     writeModelVars = TRUE, suffix = "Pred")

 rxGetVarInfo(fastTreeScoreDF)

 # Clean-up
 file.remove(xdfOut)

通过