Handler Class

Replace NaN values in a column with imputed values.

Constructor

Handler(replace_with='DefaultValue', impute_by_slot=True, concat=True, columns=None, **params)

Parameters

Name	Description
columns	a dictionary of key-value pairs, where key is the output column name and value is the input column name. Multiple key-value pairs are allowed. Input column type: numeric. Output column type: Vector Type. If the output column names are same as the input column names, then simply specify `columns` as a list of strings. The << operator can be used to set this value (see Column Operator) For example Handler(columns={'out1':'input1', 'out2':'input2'}) Handler() << {'out1':'input1', 'out2':'input2'} For more details see Columns.
replace_with	The method to use to replace NaN values. The following choices are available. Def: Replace with default value of that type, usually `0`. If no replace method is specified, this is the default strategy. Mean: Replace NaN values with the mean of the values in that column. Min: Replace with minimum value in the column. Max: Replace with maximum value in the column.
impute_by_slot	Whether to impute values by slot.
concat	Whether or not to concatenate an indicator vector column to the value column.
params	Additional arguments sent to compute engine.

Examples


   ###############################################################################
   # Filter
   import numpy as np
   import pandas as pd
   from nimbusml import FileDataStream
   from nimbusml.preprocessing.missing_values import Handler

   with_nans = pd.DataFrame(
       data=dict(
           Sepal_Length=[2.5, np.nan, 2.1, 1.0],
           Sepal_Width=[.75, .9, .8, .76],
           Petal_Length=[np.nan, 2.5, 2.6, 2.4],
           Petal_Width=[.8, .7, .9, 0.7],
           Species=["setosa", "viginica", "", 'versicolor']))

   # write NaNs to file to show how this transform work
   tmpfile = 'tmpfile_with_nans.csv'
   with_nans.to_csv(tmpfile, index=False)

   data = FileDataStream.read_csv(tmpfile, sep=',', numeric_dtype=np.float32)

   # transform usage
   xf = Handler(columns={'PL': 'Petal_Length'})

   # fit and transform
   features = xf.fit_transform(data)

   # print features
   print(features.head())

   #   PL.IsMissing.Petal_Length  PL.Petal_Length  Petal_Length  Petal_Width  ...
   # 0                        1.0              0.0           NaN          0.8  ...
   # 1                        0.0              2.5           2.5          0.7  ...
   # 2                        0.0              2.6           2.6          0.9  ...
   # 3                        0.0              2.4           2.4          0.7  ...

Remarks

Handler is a combination of Filter and Indicator. It creates two columns, one containing the imputed values as specified by replace_with argument, and the second column containing indicator values of which rows entries were imputed. This works for columns that have numeric type.

Methods

get_params

Get the parameters for this operator.

get_params

Get the parameters for this operator.

get_params(deep=False)

Parameters

Name	Description
deep	Default value: False

Share via

Handler Class

Constructor

Parameters

Examples

Remarks

Methods

get_params

Parameters