Data Transformation

2019-05-06

Important

Support for Machine Learning Studio (classic) will end on 31 August 2024. We recommend you transition to Azure Machine Learning by that date.

Beginning 1 December 2021, you will not be able to create new Machine Learning Studio (classic) resources. Through 31 August 2024, you can continue to use the existing Machine Learning Studio (classic) resources.

See information on moving machine learning projects from ML Studio (classic) to Azure Machine Learning.
Learn more about Azure Machine Learning.

ML Studio (classic) documentation is being retired and may not be updated in the future.

This article lists the modules that are provided in Machine Learning Studio (classic) for data transformation. For machine learning, data transformation entails some very general tasks, such as joining datasets or changing column names. But, it also includes many tasks that are specific to machine learning, such as normalization, binning and grouping, and inference of missing values.

Note

Applies to: Machine Learning Studio (classic) only

Similar drag-and-drop modules are available in Azure Machine Learning designer.

Important

Data that you use in Machine Learning Studio (classic) is generally expected to be "tidy" before you import it to Machine Learning Studio (classic). Data preparation might include, for example, ensuring that the data uses the correct encoding and checking that the data has a consistent schema.

Modules for data transformation are grouped into the following task-based categories:

Creating filters for digital signal processing: Digital signal filters can be applied to numeric data to support machine learning tasks such as image recognition, voice recognition, and waveform analysis.
Generating and using count-based features: Count-based featurization modules help you develop compact features to use in machine learning.
General data manipulation and preparation: Merging datasets, cleaning missing values, grouping and summarizing data, changing column names and data types, or indicating which column is a label or a feature.
Sampling and splitting datasets: Divide your data into training and test sets, split datasets by percentage or by a filter condition, or perform sampling.
Scaling and reducing data: Prepare numerical data for analysis by applying normalization or by scaling. Bin data into groups, remove or replace outliers, or perform principal component analysis (PCA).

List of modules

The following module categories are included in the Data Transformation category:

Share via

Data Transformation

List of modules

See also

Additional resources