selectFeatures:机器学习特征选择转换
特征选择转换使用指定模式从指定变量中选择特征。
用法
selectFeatures(vars, mode, ...)
参数
vars
公式或字符串向量/列表,其指定执行特征选择的变量的名称(如果模式为 minCount())。 例如,~ var1 + var2 + var3
。 如果模式为 mutualInformation(),则为描述依赖变量和独立变量的公式或字符串命名列表。 例如,label ~ ``var1 + var2 + var3
。
mode
指定特征选择模式。 可以为 minCount 或 mutualInformation。
...
要直接传递到 Microsoft 计算引擎的其他参数。
详细信息
特征选择转换使用以下两种模式之一从指定的变量中选择特征:计数或互信息。 有关详细信息,请参阅 minCount 和 mutualInformation。
值
一个 maml
对象,用于定义转换。
另请参阅
示例
trainReviews <- data.frame(review = c(
"This is great",
"I hate it",
"Love it",
"Do not like it",
"Really like it",
"I hate it",
"I like it a lot",
"I kind of hate it",
"I do like it",
"I really hate it",
"It is very good",
"I hate it a bunch",
"I love it a bunch",
"I hate it",
"I like it very much",
"I hate it very much.",
"I really do love it",
"I really do hate it",
"Love it!",
"Hate it!",
"I love it",
"I hate it",
"I love it",
"I hate it",
"I love it"),
like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
)
testReviews <- data.frame(review = c(
"This is great",
"I hate it",
"Love it",
"Really like it",
"I hate it",
"I like it a lot",
"I love it",
"I do like it",
"I really hate it",
"I love it"), stringsAsFactors = FALSE)
# Use a categorical hash transform which generated 128 features.
outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0,
mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
summary(outModel1)
# Apply a categorical hash transform and a count feature selection transform
# which selects only those hash slots that has value.
outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0,
mlTransforms = list(
categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7),
selectFeatures("reviewCatHash", mode = minCount())))
summary(outModel2)
# Apply a categorical hash transform and a mutual information feature selection transform
# which selects only 10 features with largest mutual information with the label.
outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0,
mlTransforms = list(
categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7),
selectFeatures(like ~ reviewCatHash, mode = mutualInformation(numFeaturesToKeep = 10))))
summary(outModel3)