TextCatalog.ProduceHashedWordBags Method

Definition

Overloads

ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)

Create a WordHashBagEstimator, which maps the multiple columns specified in inputColumnNames to a vector of counts of hashed n-grams in a new column named outputColumnName.

ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)

Create a WordHashBagEstimator, which maps the column specified in inputColumnName to a vector of counts of hashed n-grams in a new column named outputColumnName.

ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)

Create a WordHashBagEstimator, which maps the multiple columns specified in inputColumnNames to a vector of counts of hashed n-grams in a new column named outputColumnName.

public static Microsoft.ML.Transforms.Text.WordHashBagEstimator ProduceHashedWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string[] inputColumnNames, int numberOfBits = 16, int ngramLength = 1, int skipLength = 0, bool useAllLengths = true, uint seed = 314489979, bool useOrderedHashing = true, int maximumNumberOfInverts = 0);
static member ProduceHashedWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string[] * int * int * int * bool * uint32 * bool * int -> Microsoft.ML.Transforms.Text.WordHashBagEstimator
<Extension()>
Public Function ProduceHashedWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, inputColumnNames As String(), Optional numberOfBits As Integer = 16, Optional ngramLength As Integer = 1, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional seed As UInteger = 314489979, Optional useOrderedHashing As Boolean = true, Optional maximumNumberOfInverts As Integer = 0) As WordHashBagEstimator

Parameters

catalog
TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName
String

Name of the column resulting from the transformation of inputColumnNames. This column's data type will be known-size vector of Single.

inputColumnNames
String[]

Names of the multiple columns to take the data from. This estimator operates over vector of text.

numberOfBits
Int32

Number of bits to hash into. Must be between 1 and 30, inclusive.

ngramLength
Int32

Ngram length.

skipLength
Int32

Maximum number of tokens to skip when constructing an n-gram.

useAllLengths
Boolean

Whether to include all n-gram lengths up to ngramLength or only ngramLength.

seed
UInt32

Hashing seed.

useOrderedHashing
Boolean

Whether the position of each source column should be included in the hash (when there are multiple source columns).

maximumNumberOfInverts
Int32

During hashing we construct mappings between original values and the produced hash values. Text representation of original values are stored in the slot names of the annotations for the new column.Hashing, as such, can map many initial values to one. maximumNumberOfInverts specifies the upper bound of the number of distinct input values mapping to a hash that should be retained. 0 does not retain any input values. -1 retains all input values mapping to each hash.

Returns

Remarks

WordHashBagEstimator is different from NgramHashingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.

Applies to

ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)

Create a WordHashBagEstimator, which maps the column specified in inputColumnName to a vector of counts of hashed n-grams in a new column named outputColumnName.

public static Microsoft.ML.Transforms.Text.WordHashBagEstimator ProduceHashedWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 16, int ngramLength = 1, int skipLength = 0, bool useAllLengths = true, uint seed = 314489979, bool useOrderedHashing = true, int maximumNumberOfInverts = 0);
static member ProduceHashedWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * int * int * bool * uint32 * bool * int -> Microsoft.ML.Transforms.Text.WordHashBagEstimator
<Extension()>
Public Function ProduceHashedWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 16, Optional ngramLength As Integer = 1, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional seed As UInteger = 314489979, Optional useOrderedHashing As Boolean = true, Optional maximumNumberOfInverts As Integer = 0) As WordHashBagEstimator

Parameters

catalog
TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName
String

Name of the column resulting from the transformation of inputColumnName. This column's data type will be known-size vector of Single.

inputColumnName
String

Name of the column to take the data from. This estimator operates over vector of text.

numberOfBits
Int32

Number of bits to hash into. Must be between 1 and 30, inclusive.

ngramLength
Int32

Ngram length.

skipLength
Int32

Maximum number of tokens to skip when constructing an n-gram.

useAllLengths
Boolean

Whether to include all n-gram lengths up to ngramLength or only ngramLength.

seed
UInt32

Hashing seed.

useOrderedHashing
Boolean

Whether the position of each source column should be included in the hash (when there are multiple source columns).

maximumNumberOfInverts
Int32

During hashing we construct mappings between original values and the produced hash values. Text representation of original values are stored in the slot names of the annotations for the new column. Hashing, as such, can map many initial values to one. maximumNumberOfInverts specifies the upper bound of the number of distinct input values mapping to a hash that should be retained. 0 does not retain any input values. -1 retains all input values mapping to each hash.

Returns

Remarks

WordHashBagEstimator is different from NgramHashingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.

Applies to