Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
microsoftml.mutualinformation_select(cols: [list, str], label: str,
num_features_to_keep: int = 1000, num_bins: int = 256, **kargs)
Selects the top k features across all specified columns ordered by their mutual information with the label column.
The mutual information of two random variables X
and Y
is a
measure of the mutual dependence between the variables. Formally, the
mutual information can be written as:
I(X;Y) = E[log(p(x,y)) - log(p(x)) - log(p(y))]
where the expectation is taken over the joint distribution of X
and
Y
. Here p(x,y)
is the joint probability density function of
X
and Y
, p(x)
and p(y)
are the marginal
probability density functions of X
and Y
respectively. In
general, a higher mutual information between the dependent variable (or
label) and an independent variable (or feature) means that the label has
higher mutual dependence over that feature.
The mutual information feature selection mode selects the features based on
the mutual information. It keeps the top num_features_to_keep
features
with the largest mutual information with the label.
Specifies character string or list of the names of the variables to select.
Specifies the name of the label.
If the number of features to keep is specified to
be n
, the transform picks the n
features that have the highest
mutual information with the dependent variable. The default value is 1000.
Maximum number of bins for numerical values. Powers of 2 are recommended. The default value is 256.
Additional arguments sent to compute engine.
An object defining the transform.
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register today