Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The mlflow-experiment data source provides a Spark DataFrameReader API for loading MLflow experiment run data into a DataFrame. Azure Databricks users commonly use it to analyze training run results, compare metrics across experiments, and build dashboards on top of experiment history. For more information, see Organize training runs with MLflow experiments.
Prerequisites
Reading MLflow experiment run data requires Databricks Runtime 6.0 ML and above.
Usage
The following examples show how to load and filter MLflow experiment data using the Spark DataFrame API.
Load data from the notebook experiment
To load data from the current notebook's experiment, call load() with no arguments.
Python
df = spark.read.format("mlflow-experiment").load()
display(df)
Scala
val df = spark.read.format("mlflow-experiment").load()
display(df)
Load data using experiment IDs
To load data from one or more workspace experiments, pass the experiment IDs as a comma-separated string to load().
Python
df = spark.read.format("mlflow-experiment").load("3270527066281272")
display(df)
Scala
val df = spark.read.format("mlflow-experiment").load("3270527066281272,953590262154175")
display(df)
Load data using an experiment name
To load data by experiment name, resolve the name to an ID using the MLflow client, then pass the ID to load().
Python
expId = mlflow.get_experiment_by_name("/Shared/diabetes_experiment/").experiment_id
df = spark.read.format("mlflow-experiment").load(expId)
display(df)
Scala
val expId = mlflow.getExperimentByName("/Shared/diabetes_experiment/").get.getExperimentId
val df = spark.read.format("mlflow-experiment").load(expId)
display(df)
Filter data based on metrics and parameters
After loading experiment data, use standard DataFrame filter expressions to query across metrics and parameters.
Python
df = spark.read.format("mlflow-experiment").load("3270527066281272")
filtered_df = df.filter("metrics.loss < 0.01 AND params.learning_rate > '0.001'")
display(filtered_df)
Scala
val df = spark.read.format("mlflow-experiment").load("3270527066281272")
val filtered_df = df.filter("metrics.loss < 1.85 AND params.num_epochs > '30'")
display(filtered_df)
Output schema
The schema returned by the mlflow-experiment data source is fixed regardless of the experiment loaded:
root
|-- run_id: string
|-- experiment_id: string
|-- metrics: map
| |-- key: string
| |-- value: double
|-- params: map
| |-- key: string
| |-- value: string
|-- tags: map
| |-- key: string
| |-- value: string
|-- start_time: timestamp
|-- end_time: timestamp
|-- status: string
|-- artifact_uri: string
Additional resources
- Read OpenSharing shared tables using Apache Spark DataFrames: If your data is shared via Delta Sharing rather than stored in MLflow, use the
deltasharingformat to read shared tables with the same DataFrameReader API.