How do I orchestrate ML model retraining periodically?

Question

I have to retrain every month or so a PyTorch Model trained on data obtained from processing tables sitting in Azure Data Lake Storage gen 1.

So far, I have the following building blocks:

A Databricks notebook that does the ETL job of transforming the ADLS gen 1 tables into train/validation files that are written in blob storage
Python scripts that I can execute locally to run in an AzureML workspace an experiment so to train the PyTorch model using a ScriptRunConfig + training script as in https://learn.microsoft.com/en-us/azure/machine-learning/how-to-train-pytorch mounting blob to get the training data.

How can I schedule steps 1. and 2. to be run in sequence in a pipeline? Azure Data Factory seems a possible way to go, but what should I use as activities in ADF?

I see a few alternatives:

Stays surely a Databricks notebook
2a. Databricks python script calling the azureml-sdk classes (?)

Alternative for step 2a could be

2b. a Batch Service custom activity calling the azureml-sdk classes - seems overkill to me
2c. use a AzureML execute pipeline as ADF activity https://learn.microsoft.com/en-us/azure/machine-learning/concept-ml-pipelines (not sure how...)
2d. use a Python script Databricks activity train a PyTorch model with Databricks https://learn.microsoft.com/en-us/azure/databricks/applications/mlflow/tracking-ex-pytorch instead of calling the azureml-sdk classes

Can someone point me to the current best practice for this?

Answer

@Davide Fiocco Thanks for the question. Activate a pipeline to retrain the model using AML pipeline. We have a functional repo with training / retraining available right here. The link below explains how to use Azure DevOps Project for build and release/deployment pipelines along with Azure ML services for model retraining pipeline, model management and operationalization. Can you please add more details about the use case.
Here are some links you might find useful:

Sample AI Reference Architectures.

Share via

How do I orchestrate ML model retraining periodically?

1 answer