Track Azure Databricks ML experiments with MLflow and Azure Machine Learning

Article
02/24/2023

MLflow is an open-source library for managing the life cycle of your machine learning experiments. You can use MLflow to integrate Azure Databricks with Azure Machine Learning to ensure you get the best from both of the products.

In this article, you will learn:

The required libraries needed to use MLflow with Azure Databricks and Azure Machine Learning.
How to track Azure Databricks runs with MLflow in Azure Machine Learning.
How to log models with MLflow to get them registered in Azure Machine Learning.
How to deploy and consume models registered in Azure Machine Learning.

Prerequisites

Install the azureml-mlflow package, which handles the connectivity with Azure Machine Learning, including authentication.
An Azure Databricks workspace and cluster.
Create an Azure Machine Learning Workspace.
- See which access permissions you need to perform your MLflow operations with your workspace.

Example notebooks

The Training models in Azure Databricks and deploying them on Azure Machine Learning demonstrates how to train models in Azure Databricks and deploy them in Azure Machine Learning. It also includes how to handle cases where you also want to track the experiments and models with the MLflow instance in Azure Databricks and leverage Azure Machine Learning for deployment.

Install libraries

To install libraries on your cluster, navigate to the Libraries tab and select Install New

mlflow with azure databricks

In the Package field, type azureml-mlflow and then select install. Repeat this step as necessary to install other additional packages to your cluster for your experiment.

Azure DB install mlflow library

Track Azure Databricks runs with MLflow

Azure Databricks can be configured to track experiments using MLflow in two ways:

Track in both Azure Databricks workspace and Azure Machine Learning workspace (dual-tracking)
Track exclusively on Azure Machine Learning

By default, dual-tracking is configured for you when you linked your Azure Databricks workspace.

Dual-tracking on Azure Databricks and Azure Machine Learning

Linking your ADB workspace to your Azure Machine Learning workspace enables you to track your experiment data in the Azure Machine Learning workspace and Azure Databricks workspace at the same time. This is referred as Dual-tracking.

Warning

Dual-tracking in a private link enabled Azure Machine Learning workspace is not supported by the moment. Configure exclusive tracking with your Azure Machine Learning workspace instead.

Warning

Dual-tracking in not supported in Microsoft Azure operated by 21Vianet by the moment. Configure exclusive tracking with your Azure Machine Learning workspace instead.

To link your ADB workspace to a new or existing Azure Machine Learning workspace,

Sign in to Azure portal.
Navigate to your ADB workspace's Overview page.
Select the Link Azure Machine Learning workspace button on the bottom right.

Link Azure DB and Azure Machine Learning workspaces

After you link your Azure Databricks workspace with your Azure Machine Learning workspace, MLflow Tracking is automatically set to be tracked in all of the following places:

The linked Azure Machine Learning workspace.
Your original ADB workspace.

You can use then MLflow in Azure Databricks in the same way as you're used to. The following example sets the experiment name as it is usually done in Azure Databricks and start logging some parameters:

import mlflow 

experimentName = "/Users/{user_name}/{experiment_folder}/{experiment_name}" 
mlflow.set_experiment(experimentName) 

with mlflow.start_run():
   mlflow.log_param('epochs', 20)
   pass

Note

As opposite to tracking, model registries don't support registering models at the same time on both Azure Machine Learning and Azure Databricks. Either one or the other has to be used. Please read the section Registering models in the registry with MLflow for more details.

Tracking exclusively on Azure Machine Learning workspace

If you prefer to manage your tracked experiments in a centralized location, you can set MLflow tracking to only track in your Azure Machine Learning workspace. This configuration has the advantage of enabling easier path to deployment using Azure Machine Learning deployment options.

Warning

For private link enabled Azure Machine Learning workspace, you have to deploy Azure Databricks in your own network (VNet injection) to ensure proper connectivity.

You have to configure the MLflow tracking URI to point exclusively to Azure Machine Learning, as it is demonstrated in the following example:

Configure tracking URI

Get the tracking URI for your workspace:
- Azure CLI
- Python
- Studio
- Manually
APPLIES TO: Azure CLI ml extension v2 (current)
1. Login and configure your workspace:
```
az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location> 
```
2. You can get the tracking URI using the az ml workspace command:
```
az ml workspace show --query mlflow_tracking_uri
```
APPLIES TO: Python SDK azure-ai-ml v2 (current)

You can get the Azure Machine Learning MLflow tracking URI using the Azure Machine Learning SDK v2 for Python. Ensure you have the library azure-ai-ml installed in the compute you are using. The following sample gets the unique MLFLow tracking URI associated with your workspace.
1. Login into your workspace using the MLClient. The easier way to do that is by using the workspace config file:
```
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient.from_config(credential=DefaultAzureCredential())
```
  Tip
  
  You can download the workspace configuration file by:
  
  Navigate to Azure Machine Learning studio
  
  Click on the upper-right corner of the page -> Download config file.
  
  Save the file config.json in the same directory where you are working on.
2. Alternatively, you can use the subscription ID, resource group name and workspace name to get it:
```
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

#Enter details of your Azure Machine Learning workspace
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace_name = '<WORKSPACE_NAME>'

ml_client = MLClient(credential=DefaultAzureCredential(),
                        subscription_id=subscription_id, 
                        resource_group_name=resource_group)
```
  Important
  
  DefaultAzureCredential will try to pull the credentials from the available context. If you want to specify credentials in a different way, for instance using the web browser in an interactive way, you can use InteractiveBrowserCredential or any other method available in azure.identity package.
3. Get the Azure Machine Learning Tracking URI:
```
mlflow_tracking_uri = ml_client.workspaces.get(ml_client.workspace_name).mlflow_tracking_uri
```
Use the Azure Machine Learning portal to get the tracking URI:
1. Open the Azure Machine Learning studio portal and log in using your credentials.
2. In the upper right corner, click on the name of your workspace to show the Directory + Subscription + Workspace blade.
3. Click on View all properties in Azure Portal.
4. On the Essentials section, you will find the property MLflow tracking URI.
The Azure Machine Learning Tracking URI can be constructed using the subscription ID, region of where the resource is deployed, resource group name and workspace name. The following code sample shows how:

Warning

If you are working in a private link-enabled workspace, the MLflow endpoint will also use a private link to communicate with Azure Machine Learning. As a consequence, the tracking URI will look different as proposed here. You need to get the tracking URI using the Azure Machine Learning SDK or CLI v2 on those cases.
```
region = "<LOCATION>"
subscription_id = '<SUBSCRIPTION_ID>'
resource_group = '<RESOURCE_GROUP>'
workspace_name = '<AML_WORKSPACE_NAME>'

mlflow_tracking_uri = f"azureml://{region}.api.azureml.ms/mlflow/v1.0/subscriptions/{subscription_id}/resourceGroups/{resource_group}/providers/Microsoft.MachineLearningServices/workspaces/{workspace_name}"
```
Configuring the tracking URI:
- Using MLflow SDK
- Using environment variables
Then the method set_tracking_uri() points the MLflow tracking URI to that URI.
```
import mlflow

mlflow.set_tracking_uri(mlflow_tracking_uri)
```
You can set the MLflow environment variables MLFLOW_TRACKING_URI in your compute to make any interaction with MLflow in that compute to point by default to Azure Machine Learning.
```
MLFLOW_TRACKING_URI=$(az ml workspace show --query mlflow_tracking_uri | sed 's/"//g') 
```
Tip

When working on shared environments, like an Azure Databricks cluster, Azure Synapse Analytics cluster, or similar, it is useful to set the environment variable MLFLOW_TRACKING_URI at the cluster level to automatically configure the MLflow tracking URI to point to Azure Machine Learning for all the sessions running in the cluster rather than to do it on a per-session basis.

Once the environment variable is configured, any experiment running in such cluster will be tracked in Azure Machine Learning.

Configure authentication

Once the tracking is configured, you'll also need to configure how the authentication needs to happen to the associated workspace. By default, the Azure Machine Learning plugin for MLflow will perform interactive authentication by opening the default browser to prompt for credentials. Refer to Configure MLflow for Azure Machine Learning: Configure authentication to additional ways to configure authentication for MLflow in Azure Machine Learning workspaces.

For interactive jobs where there's a user connected to the session, you can rely on Interactive Authentication and hence no further action is required.

Warning

Interactive browser authentication will block code execution when prompting for credentials. It is not a suitable option for authentication in unattended environments like training jobs. We recommend to configure other authentication mode.

For those scenarios where unattended execution is required, you'll have to configure a service principal to communicate with Azure Machine Learning.

MLflow SDK
Using environment variables

import os

os.environ["AZURE_TENANT_ID"] = "<AZURE_TENANT_ID>"
os.environ["AZURE_CLIENT_ID"] = "<AZURE_CLIENT_ID>"
os.environ["AZURE_CLIENT_SECRET"] = "<AZURE_CLIENT_SECRET>"

export AZURE_TENANT_ID="<AZURE_TENANT_ID>"
export AZURE_CLIENT_ID="<AZURE_CLIENT_ID>"
export AZURE_CLIENT_SECRET="<AZURE_CLIENT_SECRET>"

Tip

When working on shared environments, it is advisable to configure these environment variables at the compute. As a best practice, manage them as secrets in an instance of Azure Key Vault whenever possible. For instance, in Azure Databricks you can use secrets in environment variables as follows in the cluster configuration: AZURE_CLIENT_SECRET={{secrets/<scope-name>/<secret-name>}}. See Reference a secret in an environment variable for how to do it in Azure Databricks or refer to similar documentation in your platform.

Experiment's names in Azure Machine Learning

When MLflow is configured to exclusively track experiments in Azure Machine Learning workspace, the experiment's naming convention has to follow the one used by Azure Machine Learning. In Azure Databricks, experiments are named with the path to where the experiment is saved like /Users/alice@contoso.com/iris-classifier. However, in Azure Machine Learning, you have to provide the experiment name directly. As in the previous example, the same experiment would be named iris-classifier directly:

mlflow.set_experiment(experiment_name="experiment-name")

Tracking parameters, metrics and artifacts

You can use then MLflow in Azure Databricks in the same way as you're used to. For details see Log & view metrics and log files.

Logging models with MLflow

After your model is trained, you can log it to the tracking server with the mlflow.<model_flavor>.log_model() method. <model_flavor>, refers to the framework associated with the model. Learn what model flavors are supported. In the following example, a model created with the Spark library MLLib is being registered:

mlflow.spark.log_model(model, artifact_path = "model")

It's worth to mention that the flavor spark doesn't correspond to the fact that we are training a model in a Spark cluster but because of the training framework it was used (you can perfectly train a model using TensorFlow with Spark and hence the flavor to use would be tensorflow).

Models are logged inside of the run being tracked. That means that models are available in either both Azure Databricks and Azure Machine Learning (default) or exclusively in Azure Machine Learning if you configured the tracking URI to point to it.

Important

Notice that here the parameter registered_model_name has not been specified. Read the section Registering models in the registry with MLflow for more details about the implications of such parameter and how the registry works.

Registering models in the registry with MLflow

As opposite to tracking, model registries can't operate at the same time in Azure Databricks and Azure Machine Learning. Either one or the other has to be used. By default, the Azure Databricks workspace is used for model registries; unless you chose to set MLflow Tracking to only track in your Azure Machine Learning workspace, then the model registry is the Azure Machine Learning workspace.

Then, considering you're using the default configuration, the following line will log a model inside the corresponding runs of both Azure Databricks and Azure Machine Learning, but it will register it only on Azure Databricks:

mlflow.spark.log_model(model, artifact_path = "model", 
                       registered_model_name = 'model_name')

If a registered model with the name doesn’t exist, the method registers a new model, creates version 1, and returns a ModelVersion MLflow object.
If a registered model with the name already exists, the method creates a new model version and returns the version object.

Using Azure Machine Learning Registry with MLflow

If you want to use Azure Machine Learning Model Registry instead of Azure Databricks, we recommend you to set MLflow Tracking to only track in your Azure Machine Learning workspace. This will remove the ambiguity of where models are being registered and simplifies complexity.

However, if you want to continue using the dual-tracking capabilities but register models in Azure Machine Learning, you can instruct MLflow to use Azure Machine Learning for model registries by configuring the MLflow Model Registry URI. This URI has the exact same format and value that the MLflow tracking URI.

mlflow.set_registry_uri(azureml_mlflow_uri)

Note

The value of azureml_mlflow_uri was obtained in the same way it was demostrated in Set MLflow Tracking to only track in your Azure Machine Learning workspace

For a complete example about this scenario please check the example Training models in Azure Databricks and deploying them on Azure Machine Learning.

Deploying and consuming models registered in Azure Machine Learning

Models registered in Azure Machine Learning Service using MLflow can be consumed as:

An Azure Machine Learning endpoint (real-time and batch): This deployment allows you to leverage Azure Machine Learning deployment capabilities for both real-time and batch inference in Azure Container Instances (ACI), Azure Kubernetes (AKS) or our Managed Inference Endpoints.
MLFlow model objects or Pandas UDFs, which can be used in Azure Databricks notebooks in streaming or batch pipelines.

Deploy models to Azure Machine Learning endpoints

You can leverage the azureml-mlflow plugin to deploy a model to your Azure Machine Learning workspace. Check How to deploy MLflow models page for a complete detail about how to deploy models to the different targets.

Important

Models need to be registered in Azure Machine Learning registry in order to deploy them. If your models happen to be registered in the MLflow instance inside Azure Databricks, you will have to register them again in Azure Machine Learning. If this is you case, please check the example Training models in Azure Databricks and deploying them on Azure Machine Learning

Deploy models to ADB for batch scoring using UDFs

You can choose Azure Databricks clusters for batch scoring. By leveraging Mlflow, you can resolve any model from the registry you are connected to. You will typically use one of the following two methods:

If your model was trained and built with Spark libraries (like MLLib), use mlflow.pyfunc.spark_udf to load a model and used it as a Spark Pandas UDF to score new data.
If your model wasn't trained or built with Spark libraries, either use mlflow.pyfunc.load_model or mlflow.<flavor>.load_model to load the model in the cluster driver. Notice that in this way, any parallelization or work distribution you want to happen in the cluster needs to be orchestrated by you. Also, notice that MLflow doesn't install any library your model requires to run. Those libraries need to be installed in the cluster before running it.

The following example shows how to load a model from the registry named uci-heart-classifier and used it as a Spark Pandas UDF to score new data.

from pyspark.sql.types import ArrayType, FloatType 

model_name = "uci-heart-classifier"
model_uri = "models:/"+model_name+"/latest"

#Create a Spark UDF for the MLFlow model 
pyfunc_udf = mlflow.pyfunc.spark_udf(spark, model_uri)

Tip

Check Loading models from registry for more ways to reference models from the registry.

Once the model is loaded, you can use to score new data:

#Load Scoring Data into Spark Dataframe 
scoreDf = spark.table({table_name}).where({required_conditions}) 

#Make Prediction 
preds = (scoreDf 
           .withColumn('target_column_name', pyfunc_udf('Input_column1', 'Input_column2', ' Input_column3', …)) 
        ) 

display(preds)

Clean up resources

If you wish to keep your Azure Databricks workspace, but no longer need the Azure Machine Learning workspace, you can delete the Azure Machine Learning workspace. This action results in unlinking your Azure Databricks workspace and the Azure Machine Learning workspace.

If you don't plan to use the logged metrics and artifacts in your workspace, the ability to delete them individually is unavailable at this time. Instead, delete the resource group that contains the storage account and workspace, so you don't incur any charges:

In the Azure portal, select Resource groups on the far left.
From the list, select the resource group you created.
Select Delete resource group.
Enter the resource group name. Then select Delete.

Next steps

Deploy MLflow models as an Azure web service.
Manage your models.
Track experiment jobs with MLflow and Azure Machine Learning.
Learn more about Azure Databricks and MLflow.

Track Azure Databricks ML experiments with MLflow and Azure Machine Learning

Prerequisites

Example notebooks

Install libraries

Track Azure Databricks runs with MLflow

Dual-tracking on Azure Databricks and Azure Machine Learning

Tracking exclusively on Azure Machine Learning workspace

Experiment's names in Azure Machine Learning

Tracking parameters, metrics and artifacts

Logging models with MLflow

Registering models in the registry with MLflow

Using Azure Machine Learning Registry with MLflow

Deploying and consuming models registered in Azure Machine Learning

Deploy models to Azure Machine Learning endpoints

Deploy models to ADB for batch scoring using UDFs

Clean up resources

Next steps

Feedback

Additional resources