Data Transformation - Manipulation

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

ML Studio (classic) documentation is being retired and may not be updated in the future.

This article describes the modules in Machine Learning Studio (classic) that you can use for basic data manipulation.

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Machine Learning Studio (classic) supports tasks that are specific to machine learning, such as normalization or feature selection. The modules in this category are intended for more general tasks.

Data manipulation tasks

The modules in this category are intended to support core data management tasks that might need to be performed in Machine Learning Studio (classic). The following tasks are examples of core data management tasks:

  • Combine two datasets, either by using joins, or by merging columns or rows.
  • Create new categories to use in grouping data.
  • Modify column headings, change column data types, or flag columns as features or labels.
  • Check for missing values, and then replace them with appropriate values.

Examples

For examples of how to work with complex data in machine learning experiments, see these samples in the Azure AI Gallery:

Modules in this category

The Data Transformation - Manipulation category includes the following modules:

  • Add Columns: Adds a set of columns from one dataset to another.
  • Add Rows: Appends a set of rows from an input dataset to the end of another dataset.
  • Apply SQL Transformation: Runs a SQLite query on input datasets to transform the data.
  • Clean Missing Data: Specifies how to handle values that are missing from a dataset. This module replaces Missing Values Scrubber, which has been deprecated.
  • Convert to Indicator Values: Converts categorical values in columns to indicator values.
  • Edit Metadata: Edits metadata that's associated with columns in a dataset.
  • Group Categorical Values: Groups data from multiple categories into a new category.
  • Join Data: Joins two datasets.
  • Remove Duplicate Rows: Removes duplicate rows from a dataset.
  • Select Columns in Dataset: Selects columns to include in a dataset or exclude from a dataset in an operation.
  • Select Columns Transform: Creates a transformation that selects the same subset of columns as in a specified dataset.
  • SMOTE: Increases the number of low-incidence examples in a dataset by using synthetic minority oversampling.

See also