Share via


OneHotEncodingEstimator Class

Definition

Converts one or more input columns of categorical values into as many output columns of one-hot encoded vectors.

public sealed class OneHotEncodingEstimator : Microsoft.ML.IEstimator<Microsoft.ML.Transforms.OneHotEncodingTransformer>
type OneHotEncodingEstimator = class
    interface IEstimator<OneHotEncodingTransformer>
Public NotInheritable Class OneHotEncodingEstimator
Implements IEstimator(Of OneHotEncodingTransformer)
Inheritance
OneHotEncodingEstimator
Implements

Remarks

Estimator Characteristics

Does this estimator need to look at the data to train its parameters? Yes
Input column data type Vector or scalar of numeric, boolean, text, DateTime or key type.
Output column data type Scalar or vector of key, or vector of Single type.
Exportable to ONNX Yes

The OneHotEncodingEstimator builds a dictionary of unique values appearing in the input column. The resulting OneHotEncodingTransformer converts one or more input columns into as many output columns of one-hot encoded vectors.

The OneHotEncodingEstimator is often used to convert categorical data into a form that can be provided to a machine learning algorithm.

The output of this transform is specified by OneHotEncodingEstimator.OutputKind:

  • Indicator produces an indicator vector. Each slot in this vector corresponds to a category in the dictionary, so its length is the size of the built dictionary. If a value is not found in the dictioray, the output is the zero vector.

  • Bag produces one vector such that each slot stores the number of occurances of the corresponding value in the input vector. Each slot in this vector corresponds to a value in the dictionary, so its length is the size of the built dictionary. Indicator and Bag differ simply in how the bit-vectors generated from individual slots in the input column are aggregated: for Indicator they are concatenated and for Bag they are added. When the source column is a Scalar, the Indicator and Bag options are identical.

  • Key produces keys in a KeyDataViewType column. If the input column is a vector, the output contains a vectory key type, where each slot of the vector corresponds to the respective slot of the input vector. If a category is not found in the bulit dictionary, it is assigned the value zero.

  • Binary produces a binary encoded vector to represent the values found in the dictionary that are present in the input column. If a value in the input column is not found in the dictionary, the output is the zero vector.

The OneHotEncodingTransformer can be applied to one or more columns, in which case it builds and uses a separate dictionary for each column that it is applied to.

Check the See Also section for links to usage examples.

Methods

Fit(IDataView)

Trains and returns a OneHotEncodingTransformer.

GetOutputSchema(SchemaShape)

Returns the SchemaShape of the schema which will be produced by the transformer. Used for schema propagation and verification in a pipeline.

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

See also