Algorithm & component reference for Azure Machine Learning designer

Note

Designer supports two type of components, classic prebuilt components and custom components. These two types of components are not compatible.

Classic prebuilt components provides prebuilt components majorly for data processing and traditional machine learning tasks like regression and classification. This type of component continues to be supported but will not have any new components added.

Custom components allow you to provide your own code as a component. It supports sharing across workspaces and seamless authoring across Studio, CLI, and SDK interfaces.

This article applies to classic prebuilt components.

This reference content provides the technical background on each of the classic prebuilt components available in Azure Machine Learning designer.

Each component represents a set of code that can run independently and perform a machine learning task, given the required inputs. A component might contain a particular algorithm, or perform a task that is important in machine learning, such as missing value replacement, or statistical analysis.

For help with choosing algorithms, see

Tip

In any pipeline in the designer, you can get information about a specific component. Select the Learn more link in the component card when hovering on the component in the component list, or in the right pane of the component.

Data preparation components

Functionality Description component
Data Input and Output Move data from cloud sources into your pipeline. Write your results or intermediate data to Azure Storage, or SQL Database, while running a pipeline, or use cloud storage to exchange data between pipelines. Enter Data Manually
Export Data
Import Data
Data Transformation Operations on data that are unique to machine learning, such as normalizing or binning data, dimensionality reduction, and converting data among various file formats. Add Columns
Add Rows
Apply Math Operation
Apply SQL Transformation
Clean Missing Data
Clip Values
Convert to CSV
Convert to Dataset
Convert to Indicator Values
Edit Metadata
Group Data into Bins
Join Data
Normalize Data
Partition and Sample
Remove Duplicate Rows
SMOTE
Select Columns Transform
Select Columns in Dataset
Split Data
Feature Selection Select a subset of relevant, useful features to use to build an analytical model. Filter Based Feature Selection
Permutation Feature Importance
Statistical Functions Provide a wide variety of statistical methods related to data science. Summarize Data

Machine learning algorithms

Functionality Description component
Regression Predict a value. Boosted Decision Tree Regression
Decision Forest Regression
Fast Forest Quantile Regression
Linear Regression
Neural Network Regression
Poisson Regression
Clustering Group data together. K-Means Clustering
Classification Predict a class. Choose from binary (two-class) or multiclass algorithms. Multiclass Boosted Decision Tree
Multiclass Decision Forest
Multiclass Logistic Regression
Multiclass Neural Network
One vs. All Multiclass
One vs. One Multiclass
Two-Class Averaged Perceptron
Two-Class Boosted Decision Tree
Two-Class Decision Forest
Two-Class Logistic Regression
Two-Class Neural Network
Two Class Support Vector Machine

Components for building and evaluating models

Functionality Description component
Model Training Run data through the algorithm. Train Clustering Model
Train Model
Train Pytorch Model
Tune Model Hyperparameters
Model Scoring and Evaluation Measure the accuracy of the trained model. Apply Transformation
Assign Data to Clusters
Cross Validate Model
Evaluate Model
Score Image Model
Score Model
Python Language Write code and embed it in a component to integrate Python with your pipeline. Create Python Model
Execute Python Script
R Language Write code and embed it in a component to integrate R with your pipeline. Execute R Script
Text Analytics Provide specialized computational tools for working with both structured and unstructured text. Convert Word to Vector
Extract N Gram Features from Text
Feature Hashing
Preprocess Text
Latent Dirichlet Allocation
Score Vowpal Wabbit Model
Train Vowpal Wabbit Model
Computer Vision Image data preprocessing and Image recognition related components. Apply Image Transformation
Convert to Image Directory
Init Image Transformation
Split Image Directory
DenseNet
ResNet
Recommendation Build recommendation models. Evaluate Recommender
Score SVD Recommender
Score Wide and Deep Recommender
Train SVD Recommender
Train Wide and Deep Recommender
Anomaly Detection Build anomaly detection models. PCA-Based Anomaly Detection
Train Anomaly Detection Model

Web service

Learn about the web service components, which are necessary for real-time inference in Azure Machine Learning designer.

Error messages

Learn about the error messages and exception codes that you might encounter using components in Azure Machine Learning designer.

Components environment

All built-in components in the designer will be executed in a fixed environment provided by Microsoft.

Previously this environment was based on Python 3.6, and now has been upgraded to Python 3.8. This upgrade is transparent as in the components will automatically run in the Python 3.8 environment and requires no action from the user. The environment update may impact component outputs and deploying real-time endpoint from a real-time inference, see the following sections to learn more.

Components outputs are different from previous results

After the Python version is upgraded from 3.6 to 3.8, the dependencies of built-in components may be also upgraded accordingly. Hence, you may find some components outputs are different from previous results.

If you are using the Execute Python Script component and have previously installed packages tied to Python 3.6, you may run into errors like:

  • "Could not find a version that satisfies the requirement."
  • "No matching distribution found." Then you'll need to specify the package version adapted to Python 3.8, and run your pipeline again.

Deploy real-time endpoint from real-time inference pipeline issue

If you directly deploy real-time endpoint from a previous completed real-time inference pipeline, it may run into errors.

Recommendation: clone the inference pipeline and submit it again, then deploy to real-time endpoint.

Next steps