This page provides examples of how you can use the scikit-learn package to train machine learning models in Azure Databricks. scikit-learn is one of the most popular Python libraries for single-node machine learning and is included in Databricks Runtime and Databricks Runtime ML. See Databricks Runtime release notes for the scikit-learn library version included with your cluster’s runtime.
This notebook provides a quick overview of machine learning model training on Azure Databricks. It uses the scikit-learn package to train a simple classification model. It also illustrates the use of MLflow to track the model development process, and Optuna to automate hyperparameter tuning.
If your workspace is enabled for Unity Catalog, use this version of the notebook:
End-to-end example using scikit-learn on Azure Databricks
This notebook uses scikit-learn to illustrate a complete end-to-end example of loading data, model training, distributed hyperparameter tuning, and model inference. It also illustrates model lifecycle management using MLflow Model Registry to log and register your model.
If your workspace is enabled for Unity Catalog, use this version of the notebook:
Use scikit-learn with MLflow integration on Databricks (Unity Catalog)
Azure Databricks is a cloud-scale platform for data analytics and machine learning. Data scientists and machine learning engineers can use Azure Databricks to implement machine learning solutions at scale. (DP-3014)
Manage data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning and MLflow.