What is AutoML?

Databricks AutoML helps you automatically apply machine learning to a dataset. You provide the dataset and identify the prediction target, while AutoML prepares the dataset for model training. AutoML then performs and records a set of trials that creates, tunes, and evaluates multiple models. After model evaluation, AutoML displays the results and provides a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code. AutoML also calculates summary statistics on your dataset and saves this information in a notebook that you can review later.

You can use Databricks AutoML for regression, classification, and forecasting problems. Learn more about How Azure Databricks AutoML works.

Requirements

  • Databricks Runtime 9.1 ML or above. For the general availability (GA) version, Databricks Runtime 10.4 LTS ML or above.
    • For time series forecasting, Databricks Runtime 10.0 ML or above.
    • With Databricks Runtime 9.1 LTS ML and above, AutoML depends on the databricks-automl-runtime package, which contains components that are useful outside of AutoML, and also helps simplify the notebooks generated by AutoML training. databricks-automl-runtime is available on PyPI.
  • No additional libraries other than those that are preinstalled in Databricks Runtime for Machine Learning should be installed on the cluster.
    • Any modification (removal, upgrades or downgrades) to existing library versions results in run failures due to incompatibility.
  • On a high concurrency cluster, AutoML is not compatible with table access control or credential passthrough.
  • To use Unity Catalog with AutoML, the cluster access mode must be Single user, and you must be the designated single user of the cluster.

Next steps