TextCatalog.ProduceWordBags Method

Definition

Namespace:: Microsoft.ML

Assembly:: Microsoft.ML.Transforms.dll

Package:: Microsoft.ML v4.0.1

Package:: Microsoft.ML v2.0.1

Package:: Microsoft.ML v3.0.1

Package:: Microsoft.ML v5.0.0-preview.1.25125.4

Package:: Microsoft.ML v1.0.0

Package:: Microsoft.ML v1.1.0

Package:: Microsoft.ML v1.2.0

Package:: Microsoft.ML v1.3.1

Package:: Microsoft.ML v1.4.0

Package:: Microsoft.ML v1.5.5

Package:: Microsoft.ML v1.6.0

Package:: Microsoft.ML v1.7.0

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Overloads

ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)	Create a WordBagEstimator, which maps the column specified in `inputColumnName` to a vector of n-gram counts in a new column named `outputColumnName`.
ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)	Create a WordBagEstimator, which maps the column specified in `inputColumnName` to a vector of n-gram counts in a new column named `outputColumnName`.
ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)	Create a WordBagEstimator, which maps the multiple columns specified in `inputColumnNames` to a vector of n-gram counts in a new column named `outputColumnName`.

ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Create a WordBagEstimator, which maps the column specified in inputColumnName to a vector of n-gram counts in a new column named outputColumnName.

public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, char termSeparator, char freqSeparator, string inputColumnName = default, int maximumNgramsCount = 10000000);

static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * char * char * string * int -> Microsoft.ML.Transforms.Text.WordBagEstimator

<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, termSeparator As Char, freqSeparator As Char, Optional inputColumnName As String = Nothing, Optional maximumNgramsCount As Integer = 10000000) As WordBagEstimator

Parameters

catalog: TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName: String

Name of the column resulting from the transformation of inputColumnName. This column's data type will be known-size vector of Single.

termSeparator: Char

freqSeparator: Char

inputColumnName: String

Name of the column to take the data from. <param name="maximumNgramsCount">Maximum number of n-grams to store in the dictionary.</param><param name="termSeparator">Separator used to separate terms/frequency pairs.</param><param name="freqSeparator">Separator used to separate terms from their frequency.</param> This estimator operates over vector of text.

maximumNgramsCount: Int32

Returns

WordBagEstimator

Remarks

WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.

Applies to

ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Create a WordBagEstimator, which maps the column specified in inputColumnName to a vector of n-gram counts in a new column named outputColumnName.

public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);

static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator

<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator

Parameters

catalog: TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName: String

Name of the column resulting from the transformation of inputColumnName. This column's data type will be known-size vector of Single.

inputColumnName: String

Name of the column to take the data from. This estimator operates over vector of text.

ngramLength: Int32

Ngram length.

skipLength: Int32

Maximum number of tokens to skip when constructing an n-gram.

useAllLengths: Boolean

Whether to include all n-gram lengths up to ngramLength or only ngramLength.

maximumNgramsCount: Int32

Maximum number of n-grams to store in the dictionary.

weighting: NgramExtractingEstimator.WeightingCriteria

Statistical measure used to evaluate how important a word is to a document in a corpus.

Returns

WordBagEstimator

Remarks

WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.

Applies to

ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Create a WordBagEstimator, which maps the multiple columns specified in inputColumnNames to a vector of n-gram counts in a new column named outputColumnName.

public static Microsoft.ML.Transforms.Text.WordBagEstimator ProduceWordBags(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string[] inputColumnNames, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, int maximumNgramsCount = 10000000, Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria weighting = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf);

static member ProduceWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string[] * int * int * bool * int * Microsoft.ML.Transforms.Text.NgramExtractingEstimator.WeightingCriteria -> Microsoft.ML.Transforms.Text.WordBagEstimator

<Extension()>
Public Function ProduceWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, inputColumnNames As String(), Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional maximumNgramsCount As Integer = 10000000, Optional weighting As NgramExtractingEstimator.WeightingCriteria = Microsoft.ML.Transforms.Text.NgramExtractingEstimator+WeightingCriteria.Tf) As WordBagEstimator

Parameters

catalog: TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName: String

Name of the column resulting from the transformation of inputColumnNames. This column's data type will be known-size vector of Single.

inputColumnNames: String[]

Names of the multiple columns to take the data from. This estimator operates over vector of text.

ngramLength: Int32

Ngram length.

skipLength: Int32

Maximum number of tokens to skip when constructing an n-gram.

useAllLengths: Boolean

Whether to include all n-gram lengths up to ngramLength or only ngramLength.

maximumNgramsCount: Int32

Maximum number of n-grams to store in the dictionary.

weighting: NgramExtractingEstimator.WeightingCriteria

Statistical measure used to evaluate how important a word is to a document in a corpus.

Returns

WordBagEstimator

Remarks

WordBagEstimator is different from NgramExtractingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.

Applies to

Share via

TextCatalog.ProduceWordBags Method

Definition

Overloads

ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)

Parameters

Returns

Remarks

Applies to

ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Parameters

Returns

Remarks

Applies to

ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)

Parameters

Returns

Remarks

Applies to