TextCatalog.ProduceWordBags Method

Definition

Overloads

ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)

Create a WordBagEstimator, which maps the column specified in inputColumnName to a vector of n-gram counts in a new column named outputColumnName.

ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Create a WordBagEstimator, which maps the column specified in inputColumnName to a vector of n-gram counts in a new column named outputColumnName.

ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Create a WordBagEstimator, which maps the multiple columns specified in inputColumnNames to a vector of n-gram counts in a new column named outputColumnName.

ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)

Create a WordBagEstimator, which maps the column specified in inputColumnName to a vector of n-gram counts in a new column named outputColumnName.

public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, char termSeparator, char freqSeparator, string inputColumnName = default, int maximumNgramsCount = 10000000);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * char * char * string * int -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, termSeparator As Char, freqSeparator As Char, Optional inputColumnName As String = Nothing, Optional maximumNgramsCount As Integer = 10000000) As WordBagEstimator

Parameters

catalog
TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName
String

Name of the column resulting from the transformation of inputColumnName. This column's data type will be known-size vector of Single.

termSeparator
Char
freqSeparator
Char
inputColumnName
String

Name of the column to take the data from. Maximum number of n-grams to store in the dictionary.Separator used to separate terms/frequency pairs.Separator used to separate terms from their frequency. This estimator operates over vector of text.

maximumNgramsCount
Int32

Returns

Remarks

WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.

Applies to

ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Create a WordBagEstimator, which maps the column specified in inputColumnName to a vector of n-gram counts in a new column named outputColumnName.

public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator

Parameters

catalog
TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName
String

Name of the column resulting from the transformation of inputColumnName. This column's data type will be known-size vector of Single.

inputColumnName
String

Name of the column to take the data from. This estimator operates over vector of text.

ngramLength
Int32

Ngram length.

skipLength
Int32

Maximum number of tokens to skip when constructing an n-gram.

useAllLengths
Boolean

Whether to include all n-gram lengths up to ngramLength or only ngramLength.

maximumNgramsCount
Int32

Maximum number of n-grams to store in the dictionary.

weighting
NgramExtractingEstimator.WeightingCriteria

Statistical measure used to evaluate how important a word is to a document in a corpus.

Returns

Remarks

WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.

Applies to

ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Create a WordBagEstimator, which maps the multiple columns specified in inputColumnNames to a vector of n-gram counts in a new column named outputColumnName.

public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string[] inputColumnNames, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);
static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string[] * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator
<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, inputColumnNames As String(), Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator

Parameters

catalog
TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName
String

Name of the column resulting from the transformation of inputColumnNames. This column's data type will be known-size vector of Single.

inputColumnNames
String[]

Names of the multiple columns to take the data from. This estimator operates over vector of text.

ngramLength
Int32

Ngram length.

skipLength
Int32

Maximum number of tokens to skip when constructing an n-gram.

useAllLengths
Boolean

Whether to include all n-gram lengths up to ngramLength or only ngramLength.

maximumNgramsCount
Int32

Maximum number of n-grams to store in the dictionary.

weighting
NgramExtractingEstimator.WeightingCriteria

Statistical measure used to evaluate how important a word is to a document in a corpus.

Returns

Remarks

WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.

Applies to