TextCatalog.ProduceWordBags Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32) |
Create a WordBagEstimator, which maps the column specified in |
ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria) |
Create a WordBagEstimator, which maps the column specified in |
ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria) |
Create a WordBagEstimator, which maps the multiple columns specified in |
ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)
Create a WordBagEstimator, which maps the column specified in inputColumnName
to a vector of n-gram counts in a new column named outputColumnName
.
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, char termSeparator, char freqSeparator, string inputColumnName = default, int maximumNgramsCount = 10000000);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * char * char * string * int -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, termSeparator As Char, freqSeparator As Char, Optional inputColumnName As String = Nothing, Optional maximumNgramsCount As Integer = 10000000) As WordBagEstimator
Parameters
- catalog
- TransformsCatalog.TextTransforms
The transform's catalog.
- outputColumnName
- String
Name of the column resulting from the transformation of inputColumnName
.
This column's data type will be known-size vector of Single.
- termSeparator
- Char
- freqSeparator
- Char
- inputColumnName
- String
Name of the column to take the data from. Maximum number of n-grams to store in the dictionary.Separator used to separate terms/frequency pairs.Separator used to separate terms from their frequency. This estimator operates over vector of text.
- maximumNgramsCount
- Int32
Returns
Remarks
WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.
Applies to
ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)
Create a WordBagEstimator, which maps the column specified in inputColumnName
to a vector of n-gram counts in a new column named outputColumnName
.
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator
Parameters
- catalog
- TransformsCatalog.TextTransforms
The transform's catalog.
- outputColumnName
- String
Name of the column resulting from the transformation of inputColumnName
.
This column's data type will be known-size vector of Single.
- inputColumnName
- String
Name of the column to take the data from. This estimator operates over vector of text.
- ngramLength
- Int32
Ngram length.
- skipLength
- Int32
Maximum number of tokens to skip when constructing an n-gram.
- useAllLengths
- Boolean
Whether to include all n-gram lengths up to ngramLength
or only ngramLength
.
- maximumNgramsCount
- Int32
Maximum number of n-grams to store in the dictionary.
Statistical measure used to evaluate how important a word is to a document in a corpus.
Returns
Remarks
WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.
Applies to
ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)
Create a WordBagEstimator, which maps the multiple columns specified in inputColumnNames
to a vector of n-gram counts in a new column named outputColumnName
.
public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string[] inputColumnNames, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string[] * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, inputColumnNames As String(), Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator
Parameters
- catalog
- TransformsCatalog.TextTransforms
The transform's catalog.
- outputColumnName
- String
Name of the column resulting from the transformation of inputColumnNames
.
This column's data type will be known-size vector of Single.
- inputColumnNames
- String[]
Names of the multiple columns to take the data from. This estimator operates over vector of text.
- ngramLength
- Int32
Ngram length.
- skipLength
- Int32
Maximum number of tokens to skip when constructing an n-gram.
- useAllLengths
- Boolean
Whether to include all n-gram lengths up to ngramLength
or only ngramLength
.
- maximumNgramsCount
- Int32
Maximum number of n-grams to store in the dictionary.
Statistical measure used to evaluate how important a word is to a document in a corpus.
Returns
Remarks
WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.