TextCatalog.ProduceHashedNgrams Method

Definition

Namespace:: Microsoft.ML

Assembly:: Microsoft.ML.Transforms.dll

Package:: Microsoft.ML v4.0.1

Package:: Microsoft.ML v1.0.0

Package:: Microsoft.ML v1.1.0

Package:: Microsoft.ML v1.2.0

Package:: Microsoft.ML v1.3.1

Package:: Microsoft.ML v1.4.0

Package:: Microsoft.ML v1.5.5

Package:: Microsoft.ML v1.6.0

Package:: Microsoft.ML v1.7.0

Package:: Microsoft.ML v2.0.1

Package:: Microsoft.ML v3.0.1

Package:: Microsoft.ML v5.0.0-preview.1.25125.4

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Overloads

ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)	Create a NgramHashingEstimator, which copies the data from the column specified in `inputColumnName` to a new column: `outputColumnName` and produces a vector of counts of hashed n-grams.
ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)	Create a NgramHashingEstimator, which takes the data from the multiple columns specified in `inputColumnNames` to a new column: `outputColumnName` and produces a vector of counts of hashed n-grams.

ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Create a NgramHashingEstimator, which copies the data from the column specified in inputColumnName to a new column: outputColumnName and produces a vector of counts of hashed n-grams.

public static Microsoft.ML.Transforms.Text.NgramHashingEstimator ProduceHashedNgrams(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 16, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, uint seed = 314489979, bool useOrderedHashing = true, int maximumNumberOfInverts = 0, bool rehashUnigrams = false);

static member ProduceHashedNgrams : Microsoft.ML.TransformsCatalog.TextTransforms * string * string * int * int * int * bool * uint32 * bool * int * bool -> Microsoft.ML.Transforms.Text.NgramHashingEstimator

<Extension()>
Public Function ProduceHashedNgrams (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 16, Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional seed As UInteger = 314489979, Optional useOrderedHashing As Boolean = true, Optional maximumNumberOfInverts As Integer = 0, Optional rehashUnigrams As Boolean = false) As NgramHashingEstimator

Parameters

catalog: TransformsCatalog.TextTransforms

The transform's catalog.

outputColumnName: String

Name of the column resulting from the transformation of inputColumnName. This column's data type will be vector of Single.

inputColumnName: String

Name of the column to copy the data from. This estimator operates over vector of key type.

numberOfBits: Int32

Number of bits to hash into. Must be between 1 and 30, inclusive.

ngramLength: Int32

Ngram length.

skipLength: Int32

Maximum number of tokens to skip when constructing an n-gram.

useAllLengths: Boolean

Whether to include all n-gram lengths up to ngramLength or only ngramLength.

seed: UInt32

Hashing seed.

useOrderedHashing: Boolean

Whether the position of each source column should be included in the hash (when there are multiple source columns).

maximumNumberOfInverts: Int32

During hashing we construct mappings between original values and the produced hash values. Text representation of original values are stored in the slot names of the annotations for the new column.Hashing, as such, can map many initial values to one. maximumNumberOfInverts specifies the upper bound of the number of distinct input values mapping to a hash that should be retained. <returns>0</returns> does not retain any input values. <returns>-1</returns> retains all input values mapping to each hash.

rehashUnigrams: Boolean

Whether to rehash unigrams.

Returns

NgramHashingEstimator

Remarks

NgramHashingEstimator is different from WordHashBagEstimator in a way that NgramHashingEstimator takes tokenized text as input while WordHashBagEstimator tokenizes text internally.

Applies to

ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Source:: TextCatalog.cs

Create a NgramHashingEstimator, which takes the data from the multiple columns specified in inputColumnNames to a new column: outputColumnName and produces a vector of counts of hashed n-grams.

public static Microsoft.ML.Transforms.Text.NgramHashingEstimator ProduceHashedNgrams(this Microsoft.ML.TransformsCatalog.TextTransforms catalog, string outputColumnName, string[] inputColumnNames = default, int numberOfBits = 16, int ngramLength = 2, int skipLength = 0, bool useAllLengths = true, uint seed = 314489979, bool useOrderedHashing = true, int maximumNumberOfInverts = 0, bool rehashUnigrams = false);

static member ProduceHashedNgrams : Microsoft.ML.TransformsCatalog.TextTransforms * string * string[] * int * int * int * bool * uint32 * bool * int * bool -> Microsoft.ML.Transforms.Text.NgramHashingEstimator

<Extension()>
Public Function ProduceHashedNgrams (catalog As TransformsCatalog.TextTransforms, outputColumnName As String, Optional inputColumnNames As String() = Nothing, Optional numberOfBits As Integer = 16, Optional ngramLength As Integer = 2, Optional skipLength As Integer = 0, Optional useAllLengths As Boolean = true, Optional seed As UInteger = 314489979, Optional useOrderedHashing As Boolean = true, Optional maximumNumberOfInverts As Integer = 0, Optional rehashUnigrams As Boolean = false) As NgramHashingEstimator