@SoonJoo@Genting Thanks, Previously, it was a black-box preprocessing, with user’s preprocess=True/False setting.
New change includes deprecation of preprocess
and introduction of new field featurization
, where featurization = ‘auto’ (for automatic featurization, comparable to preprocess=True) / ‘off’ (to turn off featurization, comparable to preprocess=False) / FeaturizationConfig (object to pass in customized configuration on featurization setting).
For more information on custom featurization as well as how to construct FeaturizationConfig is in this documentation.
We also have a notebook available with example in our git repo.
Usage example:
from azureml.automl.core.featurization import FeaturizationConfig
featurization_config = FeaturizationConfig()
featurization_config.add_column_purpose('Column2', 'Categorical')
featurization_config.add_column_purpose('Column5', 'Categorical')
automl_config = AutoMLConfig(task = 'classification', compute_target=compute_target, featurization=featurization_config, **automl_settings )
remote_run = experiment.submit(automl_config, show_output = False)
For classification & regression you do have the option to turn off automatic featurization.
featurization
str or FeaturizationConfig
'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used.
…
Note: Timeseries features are handled separately when the task type is set to forecasting independent of this parameter.