Track ML experiments and models with MLflow
In this article, learn how to enable MLflow Tracking to connect Azure Machine Learning as the backend of your MLflow experiments.
MLflow is an open-source library for managing the lifecycle of your machine learning experiments. MLflow Tracking is a component of MLflow that logs and tracks your training job metrics and model artifacts, no matter your experiment's environment--locally on your computer, on a remote compute target, a virtual machine, or an Azure Databricks cluster.
See MLflow and Azure Machine Learning for all supported MLflow and Azure Machine Learning functionality including MLflow Project support (preview) and model deployment.
Tip
If you want to track experiments running on Azure Databricks or Azure Synapse Analytics, see the dedicated articles Track Azure Databricks ML experiments with MLflow and Azure Machine Learning or Track Azure Synapse Analytics ML experiments with MLflow and Azure Machine Learning.
Note
The information in this document is primarily for data scientists and developers who want to monitor the model training process. If you are an administrator interested in monitoring resource usage and events from Azure Machine Learning, such as quotas, completed training jobs, or completed model deployments, see Monitoring Azure Machine Learning.
Prerequisites
Install the Mlflow SDK package
mlflow
and the Azure Machine Learning plug-in for MLflowazureml-mlflow
.pip install mlflow azureml-mlflow
Tip
You can use the package
mlflow-skinny
, which is a lightweight MLflow package without SQL storage, server, UI, or data science dependencies. It is recommended for users who primarily need the tracking and logging capabilities without importing the full suite of MLflow features including deployments.If you're doing remote tracking (tracking experiments running outside Azure Machine Learning), configure MLflow to your Azure Machine Learning workspace's tracking URI as explained at Configure MLflow for Azure Machine Learning.
- (Optional) Install and set up Azure ML CLI (v2) and make sure you install the ml extension.
- (Optional) Install and set up Azure ML SDK(v2) for Python.
Connect to your workspace
First, let's connect to Azure Machine Learning workspace where your model is registered.
Tracking is already configured for you. Your default credentials will also be used when working with MLflow.
Set experiment name
All MLflow runs are logged to the active experiment. By default, runs are logged to an experiment named Default
that is automatically created for you. You can configure the experiment where tracking is happening.
Tip
When submitting jobs using Azure ML CLI v2, you can set the experiment name using the property experiment_name
in the YAML definition of the job. You don't have to configure it on your training script. See YAML: display name, experiment name, description, and tags for details.
To configure the experiment you want to work on use MLflow command mlflow.set_experiment()
.
experiment_name = 'experiment_with_mlflow'
mlflow.set_experiment(experiment_name)
Start training job
After you set the MLflow experiment name, you can start your training job with start_run()
. Then use log_metric()
to activate the MLflow logging API and begin logging your training job metrics.
import os
from random import random
with mlflow.start_run() as mlflow_run:
mlflow.log_param("hello_param", "world")
mlflow.log_metric("hello_metric", random())
os.system(f"echo 'hello world' > helloworld.txt")
mlflow.log_artifact("helloworld.txt")
For details about how to log metrics, parameters and artifacts in a run using MLflow view How to log and view metrics.
Track jobs running on Azure Machine Learning
APPLIES TO:
Azure CLI ml extension v2 (current)
Remote runs (jobs) let you train your models in a more robust and repetitive way. They can also leverage more powerful computes, such as Machine Learning Compute clusters. See What are compute targets in Azure Machine Learning? to learn about different compute options.
When submitting runs using jobs, Azure Machine Learning automatically configures MLflow to work with the workspace the job is running in. This means that there is no need to configure the MLflow tracking URI. On top of that, experiments are automatically named based on the details of the job.
Important
When submitting training jobs to Azure Machine Learning, you don't have to configure the MLflow tracking URI on your training logic as it is already configured for you.
Creating a training routine
First, you should create a src
subdirectory and create a file with your training code in a hello_world.py
file in the src
subdirectory. All your training code will go into the src
subdirectory, including train.py
.
The training code is taken from this MLfLow example in the Azure Machine Learning example repo.
Copy this code into the file:
# imports
import os
import mlflow
from random import random
# define functions
def main():
mlflow.log_param("hello_param", "world")
mlflow.log_metric("hello_metric", random())
os.system(f"echo 'hello world' > helloworld.txt")
mlflow.log_artifact("helloworld.txt")
# run functions
if __name__ == "__main__":
# run main function
main()
Note
Note how this sample don't contains the instructions mlflow.start_run
nor mlflow.set_experiment
. This is automatically done by Azure Machine Learning.
Submitting the job
Use the Azure Machine Learning to submit a remote run. When using the Azure Machine Learning CLI (v2), the MLflow tracking URI and experiment name are set automatically and directs the logging from MLflow to your workspace. Learn more about logging Azure Machine Learning experiments with MLflow
Create a YAML file with your job definition in a job.yml
file. This file should be created outside the src
directory. Copy this code into the file:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: python hello-mlflow.py
code: src
environment: azureml:AzureML-sklearn-1.0-ubuntu20.04-py38-cpu@latest
compute: azureml:cpu-cluster
Open your terminal and use the following to submit the job.
az ml job create -f job.yml --web
View metrics and artifacts in your workspace
The metrics and artifacts from MLflow logging are tracked in your workspace. To view them anytime, navigate to your workspace and find the experiment by name in your workspace in Azure Machine Learning studio. Or run the below code.
Retrieve run metric using MLflow get_run().
from mlflow.tracking import MlflowClient
# Use MlFlow to retrieve the job that was just completed
client = MlflowClient()
run_id = mlflow_run.info.run_id
finished_mlflow_run = MlflowClient().get_run(run_id)
metrics = finished_mlflow_run.data.metrics
tags = finished_mlflow_run.data.tags
params = finished_mlflow_run.data.params
print(metrics,tags,params)
To view the artifacts of a run, you can use MlFlowClient.list_artifacts()
client.list_artifacts(run_id)
To download an artifact to the current directory, you can use MLFlowClient.download_artifacts()
client.download_artifacts(run_id, "helloworld.txt", ".")
For more details about how to retrieve information from experiments and runs in Azure Machine Learning using MLflow view Manage experiments and runs with MLflow.
Manage models
Register and track your models with the Azure Machine Learning model registry, which supports the MLflow model registry. Azure Machine Learning models are aligned with the MLflow model schema making it easy to export and import these models across different workflows. The MLflow-related metadata, such as run ID, is also tracked with the registered model for traceability. Users can submit training jobs, register, and deploy models produced from MLflow runs.
If you want to deploy and register your production ready model in one step, see Deploy and register MLflow models.
To register and view a model from a job, use the following steps:
Once a job is complete, call the
register_model()
method.# the model folder produced from a job is registered. This includes the MLmodel file, model.pkl and the conda.yaml. model_path = "model" model_uri = 'runs:/{}/{}'.format(run_id, model_path) mlflow.register_model(model_uri,"registered_model_name")
View the registered model in your workspace with Azure Machine Learning studio.
In the following example the registered model,
my-model
has MLflow tracking metadata tagged.Select the Artifacts tab to see all the model files that align with the MLflow model schema (conda.yaml, MLmodel, model.pkl).
Select MLmodel to see the MLmodel file generated by the job.
Example files
Using MLflow (Jupyter Notebooks)
Limitations
Some methods available in the MLflow API may not be available when connected to Azure Machine Learning. For details about supported and unsupported operations please read Support matrix for querying runs and experiments.
Next steps
Feedback
Submit and view feedback for