Model training examples

This section includes examples showing how to train machine learning models on Azure Databricks using many popular open-source libraries.

You can also use AutoML, which automatically prepares a dataset for model training, performs a set of trials using open-source libraries such as scikit-learn and XGBoost, and creates a Python notebook with the source code for each trial run so you can review, reproduce, and modify the code.

For an example notebook that shows how to train a machine learning model that uses data in Unity Catalog and write predictions back to Unity Catalog, see Train and register machine learning models with Unity Catalog.

Machine learning examples

Package Notebook(s) Features
scikit-learn Machine learning tutorial Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow
scikit-learn End-to-end example Classification model, MLflow, automated hyperparameter tuning with Hyperopt and MLflow, XGBoost, Model Registry, Model Serving
MLlib MLlib examples Binary classification, decision trees, GBT regression, Structured Streaming, custom transformer
xgboost XGBoost examples Python, PySpark, and Scala, single node workloads and distributed training

Hyperparameter tuning examples

For general information about hyperparameter tuning in Azure Databricks, see Hyperparameter tuning.

Package Notebook Features
Hyperopt Distributed hyperopt Distributed hyperopt, scikit-learn, MLflow
Hyperopt Compare models Use distributed hyperopt to search hyperparameter space for different model types simultaneously
Hyperopt Distributed training algorithms and hyperopt Hyperopt, MLlib
Hyperopt Hyperopt best practices Best practices for datasets of different sizes