Handler Class
Replace NaN values in a column with imputed values.
Constructor
Handler(replace_with='DefaultValue', impute_by_slot=True, concat=True, columns=None, **params)
Parameters
Name | Description |
---|---|
columns
|
a dictionary of key-value pairs, where key is the output column name and value is the input column name.
simply specify The << operator can be used to set this value (see Column Operator) For example
For more details see Columns. |
replace_with
|
The method to use to replace NaN values. The following choices are available.
replace method is specified, this is the default strategy.
|
impute_by_slot
|
Whether to impute values by slot. |
concat
|
Whether or not to concatenate an indicator vector column to the value column. |
params
|
Additional arguments sent to compute engine. |
Examples
###############################################################################
# Filter
import numpy as np
import pandas as pd
from nimbusml import FileDataStream
from nimbusml.preprocessing.missing_values import Handler
with_nans = pd.DataFrame(
data=dict(
Sepal_Length=[2.5, np.nan, 2.1, 1.0],
Sepal_Width=[.75, .9, .8, .76],
Petal_Length=[np.nan, 2.5, 2.6, 2.4],
Petal_Width=[.8, .7, .9, 0.7],
Species=["setosa", "viginica", "", 'versicolor']))
# write NaNs to file to show how this transform work
tmpfile = 'tmpfile_with_nans.csv'
with_nans.to_csv(tmpfile, index=False)
data = FileDataStream.read_csv(tmpfile, sep=',', numeric_dtype=np.float32)
# transform usage
xf = Handler(columns={'PL': 'Petal_Length'})
# fit and transform
features = xf.fit_transform(data)
# print features
print(features.head())
# PL.IsMissing.Petal_Length PL.Petal_Length Petal_Length Petal_Width ...
# 0 1.0 0.0 NaN 0.8 ...
# 1 0.0 2.5 2.5 0.7 ...
# 2 0.0 2.6 2.6 0.9 ...
# 3 0.0 2.4 2.4 0.7 ...
Remarks
Handler
is a combination of Filter and Indicator. It creates two
columns, one
containing the imputed values as specified by replace_with
argument,
and the second column containing indicator values of which rows
entries
were imputed. This works for columns that have numeric type.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
Name | Description |
---|---|
deep
|
Default value: False
|