MutualInformationFeatureSelectingEstimator Class

Definition

Selects the top k slots across all specified columns ordered by their mutual information with the label column (what you can learn about the label by observing the value of the specified column).

public sealed class MutualInformationFeatureSelectingEstimator : Microsoft.ML.IEstimator<Microsoft.ML.ITransformer>
type MutualInformationFeatureSelectingEstimator = class
    interface IEstimator<ITransformer>
Public NotInheritable Class MutualInformationFeatureSelectingEstimator
Implements IEstimator(Of ITransformer)
Inheritance
MutualInformationFeatureSelectingEstimator
Implements

Remarks

Estimator Characteristics

Does this estimator need to look at the data to train its parameters? Yes
Input column data type Vector or scalar of numeric, text or key data types
Output column data type Same as the input column
Exportable to ONNX Yes

Formally, the mutual information can be written as:

$\text{MI}(X,Y) = E_{x,y}[\log(P(x,y)) - \log(P(x)) - \log(P(y))]$ where $x$ and $y$ are observations of random variables $X$ and $Y$.

where the expectation E is taken over the joint distribution of X and Y. Here P(x, y) is the joint probability density function of X and Y, P(x) and P(y) are the marginal probability density functions of X and Y respectively. In general, a higher mutual information between the dependent variable(or label) and an independent variable(or feature) means that the label has higher mutual dependence over that feature. It keeps the top slots in output features with the largest mutual information with the label.

For example, for the following Features and Label column, if we specify that we want the top 2 slots(vector elements) that have the higher correlation with the label column, the output of applying this Estimator would keep the first and the third slots only, because their values are more correlated with the values in the Label column.

Label Features
True 4,6,0
False 0,7,5
True 4,7,0
False 0,7,0

This is how the dataset above would look, after fitting the estimator, and transforming the data with the resulting transformer:

Label Features
True 4,0
False 0,5
True 4,0
False 0,0

Check the See Also section for links to usage examples.

Methods

Fit(IDataView)

Trains and returns a ITransformer.

GetOutputSchema(SchemaShape)

Returns the SchemaShape of the schema which will be produced by the transformer. Used for schema propagation and verification in a pipeline.

Extension Methods

AppendCacheCheckpoint<TTrans>(IEstimator<TTrans>, IHostEnvironment)

Append a 'caching checkpoint' to the estimator chain. This will ensure that the downstream estimators will be trained against cached data. It is helpful to have a caching checkpoint before trainers that take multiple data passes.

WithOnFitDelegate<TTransformer>(IEstimator<TTransformer>, Action<TTransformer>)

Given an estimator, return a wrapping object that will call a delegate once Fit(IDataView) is called. It is often important for an estimator to return information about what was fit, which is why the Fit(IDataView) method returns a specifically typed object, rather than just a general ITransformer. However, at the same time, IEstimator<TTransformer> are often formed into pipelines with many objects, so we may need to build a chain of estimators via EstimatorChain<TLastTransformer> where the estimator for which we want to get the transformer is buried somewhere in this chain. For that scenario, we can through this method attach a delegate that will be called once fit is called.

Applies to

See also