Select Columns Transform
Important
Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.
Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.
- See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
- Learn more about Azure Machine Learning.
ML Studio (classic) documentation is being retired and may not be updated in the future.
Creates a transformation that selects the same subset of columns as in the given dataset
Category: Data Transformation / Manipulation
Note
Applies to: Machine Learning Studio (classic) only
Similar drag-and-drop modules are available in Azure Machine Learning designer.
This article describes how to use the Select Columns Transform module in Machine Learning Studio (classic). The purpose of the Select Columns Transform module is to ensure that a predictable, consistent set of columns is always used in downstream machine learning operations.
This module is particularly helpful for tasks such as scoring, which require specific columns. Changes in the available columns might break the experiment or change the results.
You use the Select Columns Transform to create and save a set of columns. Then, use the Apply Transformation module to apply those selections to new data.
How to use Select Columns Transform
This scenario assumes that you intend to use feature selection to generate a dynamic set of columns that will be used for training a model. To ensure that column selections are the same for the scoring process, you use the Select Columns Transform module to capture the column selections and apply them elsewhere in the experiment.
Add an input dataset to your experiment in Studio (classic).
Add an instance of Filter Based Feature Selection.
Connect the modules and configure the feature selection module to automatically find some number of best features in the input dataset.
Add an instance of Train Model and use the output of Filter Based Feature Selection as the input for training.
Important
Because feature importance is decided based on the values in the column, you cannot know in advance which columns might be available for input to Train Model.
Now, attach an instance of the Select Columns Transform module.
This generates a column selection as a transformation that can be saved or applied to other datasets. This step ensures that the columns identified by feature selection are saved for reuse by other modules.
Add the Score Model module.
Do not connect the input dataset.
Instead, add the Apply Transformation module, and connect the output of the feature selection transformation.
Important
You cannot expect to apply Filter Based Feature Selection to the scoring dataset, and get the same results. Since feature selection is based on values, it might choose a different set of columns, which would cause the scoring operation to fail.
Run the experiment.
This process of saving and then applying a column selection ensures that the same data schema is available for training and scoring.
Examples
For examples of how to use this module, see the Azure AI Gallery:
Select columns transform: A complete walkthrough that uses this module.
Filter features and remove them from scoring inputs: Save this experiment to your workspace to see how the module is used in a complete experimental workflow.
Expected inputs
Name | Type | Description |
---|---|---|
Dataset with desired columns | Data Table | Dataset containing desired set of columns |
Outputs
Name | Type | Description |
---|---|---|
Columns selection transformation | ITransform interface | Transformation that selects the same subset of columns as in the given dataset. |
Exceptions
Exception | Description |
---|---|
Error 0003 | Exception occurs if one or more of inputs are null or empty. |