Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Feature Extractors that can be used with mtText.
ngramCount(ngramLength = 1, skipLength = 0, maxNumTerms = 1e+07,
weighting = "tf")
ngramHash(ngramLength = 1, skipLength = 0, hashBits = 16,
seed = 314489979, ordered = TRUE, invertHash = 0)
An integer that specifies the maximum number of tokens to take when constructing an n-gram. The default value is 1.
An integer that specifies the maximum number of tokens to skip when constructing an n-gram. If the value specified as skip length is k
, then n-grams can contain up to k skips (not necessarily consecutive). For example, if k=2
, then the 3-grams extracted from the text "the sky is blue today" are: "the sky is", "the sky blue", "the sky today", "the is blue", "the is today" and "the blue today". The default value is 0.
An integer that specifies the maximum number of categories to include in the dictionary. The default value is 10000000.
A character string that specifies the weighting criteria:
"tf"
: to use term frequency."idf"
: to use inverse document frequency."tfidf"
: to use both term frequency and inverse document frequency.integer value. Number of bits to hash into. Must be between 1 and 30, inclusive.
integer value. Hashing seed.
TRUE
to include the position of each term in the hash. Otherwise, FALSE
. The default value is TRUE
.
An integer specifying the limit on the number of keys that can be used to generate the slot name. 0
means no invert hashing; -1
means no limit. While a zero value gives better performance, a non-zero value is needed to get meaningful coefficient names.
ngramCount
allows defining arguments for count-based feature
extraction. It accepts following options: ngramLength
, skipLength
,
maxNumTerms
and weighting
.
ngramHash
allows defining arguments for hashing-based feature
extraction. It accepts the following options: ngramLength
, skipLength
,
hashBits
, seed
, ordered
and invertHash
.
A character string defining the transform.
Microsoft Corporation Microsoft Technical Support
myData <- data.frame(opinion = c(
"I love it!",
"I love it!",
"Love it!",
"I love it a lot!",
"Really love it!",
"I hate it",
"I hate it",
"I hate it.",
"Hate it",
"Hate"),
like = rep(c(TRUE, FALSE), each = 5),
stringsAsFactors = FALSE)
outModel1 <- rxLogisticRegression(like~opinionCount, data = myData,
mlTransforms = list(featurizeText(vars = c(opinionCount = "opinion"),
wordFeatureExtractor = ngramHash(invertHash = -1, hashBits = 3))))
summary(outModel1)
outModel2 <- rxLogisticRegression(like~opinionCount, data = myData,
mlTransforms = list(featurizeText(vars = c(opinionCount = "opinion"),
wordFeatureExtractor = ngramCount(maxNumTerms = 5, weighting = "tf"))))
summary(outModel2)
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register today