Downloading AzureML experiment metrics logged with MLflow

Question

Downloading AzureML experiment metrics logged with MLflow

matsuo_basho 30

I'm using this tutorial to query the results of a job where I've logged metrics with MLFlow.

When I try the following, I get a None object returned:

mlflow.get_experiment_by_name(<exp_name)

mlflow.search_experiments()

When I try this, I get a run not found error:

mlflow.get_run(<run_name)

So mlflow is clearly not seeing my jobs. But they are there in the studio. I have an active AzureML connection using the AzureML VSCode plug-in. Help me understand what the issue is and how I can download the metrics I've logged with MLFlow.

dupammi 8,615 Reputation points Microsoft External Staff

2023-12-04T08:53:07.76+00:00

Hi @matsuo_basho ,

Thank you for using Microsoft Q&A.

To view jobs/runs information in the Azure Machine Learning studio, you can navigate to the Jobs tab and select the job of interest to enter the details view. Then, select the Output+Logs tab to view the logged metrics and click on "Download All" to download.

For more information kindly follow the steps mentioned in this document

Regarding your query about the python code throwing error, Kindly check if you have the necessary permissions and access rights to view the metrics.

I suggest you go through this python code and might it will work.

I hope this helps!
matsuo_basho 30 Reputation points

2023-12-05T00:26:43.6233333+00:00

I want to be able to download the metrics programatically. The python code you linked is the one I provided in my original post. I run the commands listed there and don't get an error, but also no output. Is there something I need to set up before running mlflow.search_experiments in order to connect to my AZ workspace?

I'm usually working through the AzureML CLI and I'm connected to my Azure environment there.
dupammi 8,615 Reputation points Microsoft External Staff

2023-12-05T06:27:17.06+00:00

Hi @matsuo_basho ,

Thank you for your response.

Based on your query, it seems like you are trying to download the metrics programmatically using the MLflow Python / CLI. To connect to your Azure Machine Learning workspace, you need to set the MLflow tracking URI to the workspace's tracking URI. You can find the tracking URI in the Azure Machine Learning studio under the Overview tab of your workspace.

For more info regarding the MLFlow with Azure CLI, refer this

I hope this helps!
matsuo_basho 30 Reputation points

2023-12-05T23:56:05.1066667+00:00

@dupammi thanks for that guidance, very helpful!

So I'm able to obtain the stats for a particular run_id. I'm looking for a way to obtain what I see below in the AzureML Studio:

In contrast, the output I downloaded programatically has just 1 result per metric:

Is there any way to obtain the metrics for each epoch using Python syntax?
Yochai Lehman 0 Reputation points Microsoft Employee

2024-07-17T16:37:47.1033333+00:00

@matsuo_basho
according to the documents, you need to use MlFlowClient.get_metric_history() to get the full metric history:

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?view=azureml-api-2&tabs=interactive#view-information-about-jobs-or-runs-with-mlflow

1 answer

Your answer

dupammi 8,615 Reputation points Microsoft External Staff

2023-12-04T08:53:07.76+00:00

Hi @matsuo_basho ,

Thank you for using Microsoft Q&A.

To view jobs/runs information in the Azure Machine Learning studio, you can navigate to the Jobs tab and select the job of interest to enter the details view. Then, select the Output+Logs tab to view the logged metrics and click on "Download All" to download.

For more information kindly follow the steps mentioned in this document

Regarding your query about the python code throwing error, Kindly check if you have the necessary permissions and access rights to view the metrics.

I suggest you go through this python code and might it will work.

I hope this helps!
matsuo_basho 30 Reputation points

2023-12-05T00:26:43.6233333+00:00

I want to be able to download the metrics programatically. The python code you linked is the one I provided in my original post. I run the commands listed there and don't get an error, but also no output. Is there something I need to set up before running mlflow.search_experiments in order to connect to my AZ workspace?

I'm usually working through the AzureML CLI and I'm connected to my Azure environment there.
dupammi 8,615 Reputation points Microsoft External Staff

2023-12-05T06:27:17.06+00:00

Hi @matsuo_basho ,

Thank you for your response.

Based on your query, it seems like you are trying to download the metrics programmatically using the MLflow Python / CLI. To connect to your Azure Machine Learning workspace, you need to set the MLflow tracking URI to the workspace's tracking URI. You can find the tracking URI in the Azure Machine Learning studio under the Overview tab of your workspace.

For more info regarding the MLFlow with Azure CLI, refer this

I hope this helps!
matsuo_basho 30 Reputation points

2023-12-05T23:56:05.1066667+00:00

@dupammi thanks for that guidance, very helpful!

So I'm able to obtain the stats for a particular run_id. I'm looking for a way to obtain what I see below in the AzureML Studio:

In contrast, the output I downloaded programatically has just 1 result per metric:

Is there any way to obtain the metrics for each epoch using Python syntax?
Yochai Lehman 0 Reputation points Microsoft Employee

2024-07-17T16:37:47.1033333+00:00

@matsuo_basho
according to the documents, you need to use MlFlowClient.get_metric_history() to get the full metric history:

https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?view=azureml-api-2&tabs=interactive#view-information-about-jobs-or-runs-with-mlflow

Answer 1

dupammi 8,615 Microsoft External Staff

Hi @matsuo_basho ,

I'm glad that the guidance was helpful.

Regarding your next query, you can obtain the metrics for each epoch using the MLflow Python API. To do this, you need to log the metrics for each epoch using the mlflow.log_metric() function in your training script. Please use a FOR loop in the python script to iterate through all the values you are interested in and log it using log_metric from within the loop.

A quick sample you may want to refer to, please adjust it according to your scenario -

import mlflow

# Start an MLflow run
with mlflow.start_run():

    # Train your model
    for epoch in range(num_epochs):
        # Train your model for one epoch
        train_loss, train_acc = train_one_epoch(...)
        val_loss, val_acc = validate(...)
        
        # Log the metrics for this epoch
        mlflow.log_metric("train_loss", train_loss, step=epoch)
        mlflow.log_metric("train_acc", train_acc, step=epoch)
        mlflow.log_metric("val_loss", val_loss, step=epoch)
        mlflow.log_metric("val_acc", val_acc, step=epoch)

Once you've logged the metrics for each epoch, you can retrieve them using the methods discussed in my previous responses.

Hope this helps.

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

matsuo_basho 30

@dupammi I should have been more precise in my question. To get the output that I included in the studio, I am running a Transformers model using this type of syntax:

  with mlflow.start_run():
        mlflow.transformers.autolog(log_models=False)
        trainer = Trainer(model,
                          training_args,
                          train_dataset=tokenized_dataset['train'],
                          eval_dataset=tokenized_dataset['test'],
                          data_collator=data_collator,
                          compute_metrics=compute_metrics_partial,
                          tokenizer=tokenizer)
        trainer.train()

So the logging happens automatically after every epoch. It appears that there is currently no way to also get the metrics output programatically, even though it is logged in the studio?

dupammi 8,615 Reputation points Microsoft External Staff

2023-12-07T03:40:54.46+00:00

Hi @matsuo_basho ,

Based on your code, it looks like you are using the mlflow.transformers.autolog() method to automatically log metrics during model training. However, you are correct that this method does not provide a way to programmatically retrieve the metrics that are logged in the MLflow run.

To programmatically retrieve the metrics, you can still use the mlflow.autolog() method to automatically log metrics, parameters, and artifacts for your code. This is not specific to the transformers API but can be used with any ML library.

For more information, please refer these Azure official documentation on Logging custom models & Logging models with a custom signature, environment or samples.

Please modify the sample code as per your needs.

I hope you understand. Thank you!
dupammi 8,615 Reputation points Microsoft External Staff

2023-12-07T16:06:57.5666667+00:00

Hi @matsuo_basho ,

Following up to see if the above response was helpful.
dupammi 8,615 Reputation points Microsoft External Staff

2023-12-07T23:44:40.8966667+00:00

Hi @matsuo_basho ,

Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful.

matsuo_basho 30

@dupammi thanks for these links.

So it seems that there's a way to adapt the Transformers training function to my use-case of replicating the auto logging of metrics for each epoch, from the link you provided:

from mlflow.pyfunc import PythonModel, PythonModelContext

class ModelWrapper(PythonModel):
    def __init__(self, model):
        self._model = model

    def predict(self, context: PythonModelContext, data):
        # You don't have to keep the semantic meaning of `predict`. You can use here model.recommend(), model.forecast(), etc
        return self._model.predict_proba(data)

    # You can even add extra functions if you need to. Since the model is serialized,
    # all of them will be available when you load your model back.
    def predict_batch(self, data):
        pass

import mlflow
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from mlflow.models import infer_signature

mlflow.xgboost.autolog(log_models=False)

model = XGBClassifier(use_label_encoder=False, eval_metric="logloss")
model.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=False)
y_probs = model.predict_proba(X_test)

accuracy = accuracy_score(y_test, y_probs.argmax(axis=1))
mlflow.log_metric("accuracy", accuracy)

signature = infer_signature(X_test, y_probs)
mlflow.pyfunc.log_model("classifier", 
                        python_model=ModelWrapper(model),
                        signature=signature)

However, I'm not quite sure how. The thing is that the transformers setup currently logs with each epoch like in the code I included a couple of posts up.

Share via

Downloading AzureML experiment metrics logged with MLflow

1 answer

Your answer