TextCatalog.ProduceHashedWordBags Method
Definition
Important
Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.
Overloads
ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32) |
Create a WordHashBagEstimator, which maps the multiple columns specified in |
ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32) |
Create a WordHashBagEstimator, which maps the column specified in |
ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)
Create a WordHashBagEstimator, which maps the multiple columns specified in inputColumnNames
to a vector of counts of hashed n-grams in a new column named outputColumnName
.
public static Microsoft.ML.Transforms.Text.WordHashBagEstimator ProduceHashedWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string[] inputColumnNames, int numberOfBits = 16, int ngramLength = 1, int skipLength = 0, bool useAllLengths = true, uint seed = 314489979, bool useOrderedHashing = true, int maximumNumberOfInverts = 0);
static member ProduceHashedWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string[] * int * int * int * bool * uint32 * bool * int -> Microsoft.ML.Transforms.Text.WordHashBagEstimator
<Extension()>
Public Function ProduceHashedWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, inputColumnNames As String(), Optional numberOfBits As Integer = 16, Optional ngramLength As Integer = 1, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional seed As UInteger = 314489979, Optional useOrderedHashing As Boolean = true, Optional maximumNumberOfInverts As Integer = 0) As WordHashBagEstimator
Parameters
- catalog
- TransformsCatalog.TextTransforms
The transform's catalog.
- outputColumnName
- String
Name of the column resulting from the transformation of inputColumnNames
.
This column's data type will be known-size vector of Single.
- inputColumnNames
- String[]
Names of the multiple columns to take the data from. This estimator operates over vector of text.
- numberOfBits
- Int32
Number of bits to hash into. Must be between 1 and 30, inclusive.
- ngramLength
- Int32
Ngram length.
- skipLength
- Int32
Maximum number of tokens to skip when constructing an n-gram.
- useAllLengths
- Boolean
Whether to include all n-gram lengths up to ngramLength
or only ngramLength
.
- seed
- UInt32
Hashing seed.
- useOrderedHashing
- Boolean
Whether the position of each source column should be included in the hash (when there are multiple source columns).
- maximumNumberOfInverts
- Int32
During hashing we construct mappings between original values and the produced hash values.
Text representation of original values are stored in the slot names of the annotations for the new column.Hashing, as such, can map many initial values to one.
maximumNumberOfInverts
specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
0 does not retain any input values. -1 retains all input values mapping to each hash.
Returns
Remarks
WordHashBagEstimator is different from NgramHashingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.
Applies to
ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)
Create a WordHashBagEstimator, which maps the column specified in inputColumnName
to a vector of counts of hashed n-grams in a new column named outputColumnName
.
public static Microsoft.ML.Transforms.Text.WordHashBagEstimator ProduceHashedWordBags (this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 16, int ngramLength = 1, int skipLength = 0, bool useAllLengths = true, uint seed = 314489979, bool useOrderedHashing = true, int maximumNumberOfInverts = 0);
static member ProduceHashedWordBags : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * int * int * bool * uint32 * bool * int -> Microsoft.ML.Transforms.Text.WordHashBagEstimator
<Extension()>
Public Function ProduceHashedWordBags (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 16, Optional ngramLength As Integer = 1, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional seed As UInteger = 314489979, Optional useOrderedHashing As Boolean = true, Optional maximumNumberOfInverts As Integer = 0) As WordHashBagEstimator
Parameters
- catalog
- TransformsCatalog.TextTransforms
The transform's catalog.
- outputColumnName
- String
Name of the column resulting from the transformation of inputColumnName
.
This column's data type will be known-size vector of Single.
- inputColumnName
- String
Name of the column to take the data from. This estimator operates over vector of text.
- numberOfBits
- Int32
Number of bits to hash into. Must be between 1 and 30, inclusive.
- ngramLength
- Int32
Ngram length.
- skipLength
- Int32
Maximum number of tokens to skip when constructing an n-gram.
- useAllLengths
- Boolean
Whether to include all n-gram lengths up to ngramLength
or only ngramLength
.
- seed
- UInt32
Hashing seed.
- useOrderedHashing
- Boolean
Whether the position of each source column should be included in the hash (when there are multiple source columns).
- maximumNumberOfInverts
- Int32
During hashing we construct mappings between original values and the produced hash values.
Text representation of original values are stored in the slot names of the annotations for the new column. Hashing, as such, can map many initial values to one.
maximumNumberOfInverts
specifies the upper bound of the number of distinct input values mapping to a hash that should be retained.
0 does not retain any input values. -1 retains all input values mapping to each hash.
Returns
Remarks
WordHashBagEstimator is different from NgramHashingEstimator in that the former tokenizes text internally and the latter takes tokenized text as input.