minCount:特征选择计数模式
特征选择转换 selectFeatures 中使用的特征选择的计数模式。
用法
minCount(count = 1, ...)
参数
count
基于计数的特征选择的阈值。 当且仅当至少有 count
个示例具有某项特征的非默认值时,才选择该特征。 默认值为 1。
...
要直接传递到 Microsoft 计算引擎的其他参数。
详细信息
在特征选择转换中使用计数模式时,如果具有某项特征的非默认值的示例数大于或等于指定计数,择选择该特征。 计数模式特征选择转换在与分类哈希转换一起应用时非常有用(另请参阅 categoricalHash)。 计数特征选择可以删除示例中没有数据的哈希转换生成的那些特征。
值
定义计数模式的字符串。
作者
Microsoft Corporation Microsoft Technical Support
另请参阅
mutualInformation selectFeatures
示例
trainReviews <- data.frame(review = c(
"This is great",
"I hate it",
"Love it",
"Do not like it",
"Really like it",
"I hate it",
"I like it a lot",
"I kind of hate it",
"I do like it",
"I really hate it",
"It is very good",
"I hate it a bunch",
"I love it a bunch",
"I hate it",
"I like it very much",
"I hate it very much.",
"I really do love it",
"I really do hate it",
"Love it!",
"Hate it!",
"I love it",
"I hate it",
"I love it",
"I hate it",
"I love it"),
like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
)
testReviews <- data.frame(review = c(
"This is great",
"I hate it",
"Love it",
"Really like it",
"I hate it",
"I like it a lot",
"I love it",
"I do like it",
"I really hate it",
"I love it"), stringsAsFactors = FALSE)
# Use a categorical hash transform which generated 128 features.
outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0,
mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
summary(outModel1)
# Apply a categorical hash transform and a count feature selection transform
# which selects only those hash features that has value.
outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0,
mlTransforms = list(
categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7),
selectFeatures("reviewCatHash", mode = minCount())))
summary(outModel2)
# Apply a categorical hash transform and a mutual information feature selection transform
# which selects those features appearing with at least a count of 5.
outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0,
mlTransforms = list(
categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7),
selectFeatures("reviewCatHash", mode = minCount(count = 5))))
summary(outModel3)