Algorithm & component reference for Azure Machine Learning designer

Статия
03/08/2023

APPLIES TO: Python SDK azure-ai-ml v2 (current)

Note

Designer supports two type of components, classic prebuilt components and custom components. These two types of components are not compatible.

Classic prebuilt components provides prebuilt components majorly for data processing and traditional machine learning tasks like regression and classification. This type of component continues to be supported but will not have any new components added.

Custom components allow you to provide your own code as a component. It supports sharing across workspaces and seamless authoring across Studio, CLI, and SDK interfaces.

This article applies to classic prebuilt components.

This reference content provides the technical background on each of the classic prebuilt components available in Azure Machine Learning designer.

Each component represents a set of code that can run independently and perform a machine learning task, given the required inputs. A component might contain a particular algorithm, or perform a task that is important in machine learning, such as missing value replacement, or statistical analysis.

For help with choosing algorithms, see

Tip

In any pipeline in the designer, you can get information about a specific component. Select the Learn more link in the component card when hovering on the component in the component list, or in the right pane of the component.

Data preparation components

Functionality	Description	component
Data Input and Output	Move data from cloud sources into your pipeline. Write your results or intermediate data to Azure Storage, or SQL Database, while running a pipeline, or use cloud storage to exchange data between pipelines.	Enter Data Manually Export Data Import Data
Data Transformation	Operations on data that are unique to machine learning, such as normalizing or binning data, dimensionality reduction, and converting data among various file formats.	Add Columns Add Rows Apply Math Operation Apply SQL Transformation Clean Missing Data Clip Values Convert to CSV Convert to Dataset Convert to Indicator Values Edit Metadata Group Data into Bins Join Data Normalize Data Partition and Sample Remove Duplicate Rows SMOTE Select Columns Transform Select Columns in Dataset Split Data
Feature Selection	Select a subset of relevant, useful features to use to build an analytical model.	Filter Based Feature Selection Permutation Feature Importance
Statistical Functions	Provide a wide variety of statistical methods related to data science.	Summarize Data

Machine learning algorithms

Functionality	Description	component
Regression	Predict a value.	Boosted Decision Tree Regression Decision Forest Regression Fast Forest Quantile Regression Linear Regression Neural Network Regression Poisson Regression
Clustering	Group data together.	K-Means Clustering
Classification	Predict a class. Choose from binary (two-class) or multiclass algorithms.	Multiclass Boosted Decision Tree Multiclass Decision Forest Multiclass Logistic Regression Multiclass Neural Network One vs. All Multiclass One vs. One Multiclass Two-Class Averaged Perceptron Two-Class Boosted Decision Tree Two-Class Decision Forest Two-Class Logistic Regression Two-Class Neural Network Two Class Support Vector Machine

Components for building and evaluating models

Functionality	Description	component
Model Training	Run data through the algorithm.	Train Clustering Model Train Model Train Pytorch Model Tune Model Hyperparameters
Model Scoring and Evaluation	Measure the accuracy of the trained model.	Apply Transformation Assign Data to Clusters Cross Validate Model Evaluate Model Score Image Model Score Model
Python Language	Write code and embed it in a component to integrate Python with your pipeline.	Create Python Model Execute Python Script
R Language	Write code and embed it in a component to integrate R with your pipeline.	Execute R Script
Text Analytics	Provide specialized computational tools for working with both structured and unstructured text.	Convert Word to Vector Extract N Gram Features from Text Feature Hashing Preprocess Text Latent Dirichlet Allocation Score Vowpal Wabbit Model Train Vowpal Wabbit Model
Computer Vision	Image data preprocessing and Image recognition related components.	Apply Image Transformation Convert to Image Directory Init Image Transformation Split Image Directory DenseNet ResNet
Recommendation	Build recommendation models.	Evaluate Recommender Score SVD Recommender Score Wide and Deep Recommender Train SVD Recommender Train Wide and Deep Recommender
Anomaly Detection	Build anomaly detection models.	PCA-Based Anomaly Detection Train Anomaly Detection Model

Web service

Learn about the web service components, which are necessary for real-time inference in Azure Machine Learning designer.

Error messages

Learn about the error messages and exception codes that you might encounter using components in Azure Machine Learning designer.

Components environment

All built-in components in the designer will be executed in a fixed environment provided by Microsoft.

Previously this environment was based on Python 3.6, and now has been upgraded to Python 3.8. This upgrade is transparent as in the components will automatically run in the Python 3.8 environment and requires no action from the user. The environment update may impact component outputs and deploying real-time endpoint from a real-time inference, see the following sections to learn more.

Components outputs are different from previous results

After the Python version is upgraded from 3.6 to 3.8, the dependencies of built-in components may be also upgraded accordingly. Hence, you may find some components outputs are different from previous results.

If you are using the Execute Python Script component and have previously installed packages tied to Python 3.6, you may run into errors like:

"Could not find a version that satisfies the requirement."
"No matching distribution found." Then you'll need to specify the package version adapted to Python 3.8, and run your pipeline again.

Deploy real-time endpoint from real-time inference pipeline issue

If you directly deploy real-time endpoint from a previous completed real-time inference pipeline, it may run into errors.

Recommendation: clone the inference pipeline and submit it again, then deploy to real-time endpoint.

Next steps

Tutorial: Build a model in designer to predict auto prices