Read MLflow experiments

The mlflow-experiment data source provides a Spark DataFrameReader API for loading MLflow experiment run data into a DataFrame. Azure Databricks users commonly use it to analyze training run results, compare metrics across experiments, and build dashboards on top of experiment history. For more information, see Organize training runs with MLflow experiments.

Prerequisites

Reading MLflow experiment run data requires Databricks Runtime 6.0 ML and above.

Usage

The following examples show how to load and filter MLflow experiment data using the Spark DataFrame API.

Load data from the notebook experiment

To load data from the current notebook's experiment, call load() with no arguments.

Python

df = spark.read.format("mlflow-experiment").load()
display(df)

Scala

val df = spark.read.format("mlflow-experiment").load()
display(df)

Load data using experiment IDs

To load data from one or more workspace experiments, pass the experiment IDs as a comma-separated string to load().

Python

df = spark.read.format("mlflow-experiment").load("3270527066281272")
display(df)

Scala

val df = spark.read.format("mlflow-experiment").load("3270527066281272,953590262154175")
display(df)

Load data using an experiment name

To load data by experiment name, resolve the name to an ID using the MLflow client, then pass the ID to load().

Python

expId = mlflow.get_experiment_by_name("/Shared/diabetes_experiment/").experiment_id
df = spark.read.format("mlflow-experiment").load(expId)
display(df)

Scala

val expId = mlflow.getExperimentByName("/Shared/diabetes_experiment/").get.getExperimentId
val df = spark.read.format("mlflow-experiment").load(expId)
display(df)

Filter data based on metrics and parameters

After loading experiment data, use standard DataFrame filter expressions to query across metrics and parameters.

Python

df = spark.read.format("mlflow-experiment").load("3270527066281272")
filtered_df = df.filter("metrics.loss < 0.01 AND params.learning_rate > '0.001'")
display(filtered_df)

Scala

val df = spark.read.format("mlflow-experiment").load("3270527066281272")
val filtered_df = df.filter("metrics.loss < 1.85 AND params.num_epochs > '30'")
display(filtered_df)

Output schema

The schema returned by the mlflow-experiment data source is fixed regardless of the experiment loaded:

root
|-- run_id: string
|-- experiment_id: string
|-- metrics: map
|    |-- key: string
|    |-- value: double
|-- params: map
|    |-- key: string
|    |-- value: string
|-- tags: map
|    |-- key: string
|    |-- value: string
|-- start_time: timestamp
|-- end_time: timestamp
|-- status: string
|-- artifact_uri: string

Additional resources