Automated machine learning (AutoML) automates the process of applying machine learning to data. Given a dataset, you can run AutoML to iterate over different data transformations, machine learning algorithms, and hyperparameters to select the best model.
Note
This article refers to the ML.NET AutoML API, which is currently in preview. Material may be subject to change.
How does AutoML work?
In general, the workflow to train machine learning models is as follows:
Define a problem
Collect data
Preprocess data
Train a model
Evaluate the model
Preprocessing, training, and evaluation are an experimental and iterative process that requires multiple trials until you achieve satisfactory results. Because these tasks tend to be repetitive, AutoML can help automate these steps. In addition to automation, optimization techniques are used during the training and evaluation process to find and select algorithms and hyperparameters.
When should I use AutoML?
Whether you're just getting started with machine learning or you're an experienced user, AutoML provides solutions for automating the model development process.
Beginners - If you're new to machine learning, AutoML simplifies the model development process by providing a set of defaults that reduces the number of decisions you have to make when training your model. In doing so, you can focus on your data and the problem you're trying to solve and let AutoML do the rest.
Experienced users - If you have some experience with machine learning, you can customize, configure, and extend the defaults provided by AutoML based on your needs while still leveraging its automation capabilities.
AutoML in ML.NET
Featurizer - Convenience API to automate data preprocessing.
Trial - A single hyperparamters optimization run.
Experiment - A collection of AutoML trials. ML.NET provides a high-level API for creating experiments which sets defaults for the individual Sweepable Pipeline, Search Space, and Tuner components.
Search Space - The range of available options to choose hyperparameters from.
Tuner - The algorithms used to optimize hyperparameters. ML.NET supports the following tuners:
Eci Cost Frugal Tuner - Implementation of Cost Frugal Tuner for hierarchical search spaces. Default tuner used by AutoML.
SMAC - Tuner that uses random forests to apply Bayesian optimization.
Grid Search - Tuner that works best for small search spaces.
Random Search
Sweepable Estimator - An ML.NET estimator that contains a search space.
Sweepable Pipeline - An ML.NET pipeline that contains one or more Sweepable Estimators.
Trial Runner - AutoML component that uses sweepable pipelines and trial settings to generate trial results from model training and evaluation.
It's recommended for beginners to start with the defaults provided by the high-level experiment API. For more experienced users looking for customization options, use the sweepable estimator, sweepable pipeline, search space, trial runner and tuner components.
The source for this content can be found on GitHub, where you can also create and review issues and pull requests. For more information, see our contributor guide.
.NET feedback
.NET is an open source project. Select a link to provide feedback:
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.