Set up MLOps with GitHub

Article
02/07/2025

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Azure Machine Learning allows you to integrate with GitHub Actions to automate the machine learning lifecycle. Some of the operations you can automate are:

Deployment of Azure Machine Learning infrastructure
Data preparation (extract, transform, load operations)
Training machine learning models with on-demand scale-out and scale-up
Deployment of machine learning models as public or private web services
Monitoring deployed machine learning models (such as for performance analysis)

In this article, you learn about using Azure Machine Learning to set up an end-to-end MLOps pipeline that runs a linear regression to predict taxi fares in NYC. The pipeline is made up of components, each serving different functions, which can be registered with the workspace, versioned, and reused with various inputs and outputs. You're going to be using the recommended Azure architecture for MLOps and Azure MLOps (v2) solution accelerator to quickly set up an MLOps project in Azure Machine Learning.

Tip

We recommend you understand some of the recommended Azure architectures for MLOps before implementing any solution. You need to pick the best architecture for your given Machine learning project.

Prerequisites

An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the free or paid version of Machine Learning.
A Machine Learning workspace.
Git running on your local machine.
GitHub as the source control repository

Note

Git version 2.27 or newer is required. For more information on installing the Git command, see https://git-scm.com/downloads and select your operating system

Important

The CLI commands in this article were tested using Bash. If you use a different shell, you might encounter errors.

Set up authentication with Azure and GitHub Actions

Before you can set up an MLOps project with Machine Learning, you need to set up authentication for GitHub Actions.

Create service principal

Create one Prod service principal for this demo. You can add more depending on how many environments, you want to work on (Dev or Prod or Both). Service principals can be created using one of the following methods:

Create from Azure Cloud Shell
Create from Azure portal

Launch the Azure Cloud Shell.

Tip

The first time you launch the Cloud Shell, you're prompted to create a storage account for the Cloud Shell.
If prompted, choose Bash as the environment used in the Cloud Shell. You can also change environments in the drop-down on the top navigation bar

Copy the following bash commands to your computer and update the projectName, subscriptionId, and environment variables with the values for your project. This command also grants the Contributor role to the service principal in the subscription provided. This information is required for GitHub Actions to properly use resources in that subscription.

projectName="<your project name>"
roleName="Contributor"
subscriptionId="<subscription Id>"
environment="<Prod>" #First letter should be capitalized
servicePrincipalName="Azure-ARM-${environment}-${projectName}"
# Verify the ID of the active subscription
echo "Using subscription ID $subscriptionId"
echo "Creating SP for RBAC with name $servicePrincipalName, with role $roleName and in scopes     /subscriptions/$subscriptionId"
az ad sp create-for-rbac --name $servicePrincipalName --role $roleName --scopes /subscriptions/$subscriptionId --json-auth 
echo "Please ensure that the information created here is properly save for future use."

Note

The parameter --json-auth of the az ad sp create-for-rbac command is available in Azure CLI versions >= 2.51.0. Versions earlier than this use --sdk-auth.

Copy your edited commands into the Azure Shell and run them (Ctrl + Shift + v).

After running these commands, you'll be presented with information related to the service principal. Save this information to a safe location, you use it later in these steps.


  {
  "clientId": "<service principal client id>",  
  "clientSecret": "<service principal client secret>",
  "subscriptionId": "<Azure subscription id>",  
  "tenantId": "<Azure tenant id>",
  "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
  "resourceManagerEndpointUrl": "https://management.azure.com/", 
  "activeDirectoryGraphResourceId": "https://graph.windows.net/", 
  "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
  "galleryEndpointUrl": "https://gallery.azure.com/",
  "managementEndpointUrl": "https://management.core.windows.net/" 
  }

Copy all of this output, braces included. Save this information to a safe location, it's use later in these steps.
Close the Cloud Shell once the service principals are created.

Set up GitHub repo

Fork the MLOps v2 Demo Template Repo in your GitHub organization
Go to https://github.com/Azure/mlops-v2-gha-demo/fork to fork the MLOps v2 demo repo into your GitHub org. This repo has reusable MLOps code that can be used across multiple projects.
From your GitHub project, select Settings:
Then select Secrets, then Actions:
Select New repository secret. Name this secret AZURE_CREDENTIALS and paste the service principal output as the content of the secret. Select Add secret.
Add each of the following GitHub secrets using the corresponding values from the service principal output as the content of the secret:
- ARM_CLIENT_ID
- ARM_CLIENT_SECRET
- ARM_SUBSCRIPTION_ID
- ARM_TENANT_ID

Note

This finishes the prerequisite section and the deployment of the solution accelerator can happen accordingly.

Deploy machine learning project infrastructure with GitHub Actions

This step deploys the training pipeline to the Machine Learning workspace created in the previous steps.

Tip

Make sure you understand the Architectural Patterns of the solution accelerator before you check out the MLOps v2 repo and deploy the infrastructure. In examples, you use the classical ML project type.

Configure Machine Learning environment parameters

Go to your repository and select the config-infra-prod.yml file in the root. Change the following parameters to your liking, and then commit the changes.

This config file uses the namespace and postfix values the names of the artifacts to ensure uniqueness. Update the following section in the config to your liking. The following text shows the default values and settings in the files:

   namespace: mlopslite #Note: A namespace with many characters will cause storage account creation to fail due to storage account names having a limit of 24 characters.
   postfix: ao04
   location: westus
    
   environment: prod
   enable_aml_computecluster: true
   enable_aml_secure_workspace: true
   enable_monitoring: false

Note

If you're running a Deep Learning workload such as CV or NLP, ensure your GPU compute is available in your deployment zone. The enable_monitoring flag in these files defaults to False. Enabling this flag adds more elements to the deployment to support Azure Machine Learning monitoring based on https://github.com/microsoft/AzureML-Observability. This flag enables an ADX cluster and increases the deployment time and cost of the MLOps solution.

Deploy Machine Learning infrastructure

In your GitHub project repository (ex: taxi-fare-regression), select Actions

The pre-defined GitHub workflows associated with your project are displayed. For a classical machine learning project, the available workflows look similar to the following screenshot:
Select tf-gha-deploy-infra.yml to deploy the Machine Learning infrastructure using GitHub Actions and Terraform.
On the right side of the page, select Run workflow and select the branch to run the workflow on. This action might deploy Dev Infrastructure if you created a dev branch or Prod infrastructure if deploying from main. Monitor the workflow for successful completion.
When the pipeline has complete successfully, you can find your Azure Machine Learning Workspace and associated resources by logging in to the Azure portal. Next, a model training and scoring pipelines are deployed into the new Machine Learning environment.

Sample Training and Deployment Scenario

The solution accelerator includes code and data for a sample end-to-end machine learning pipeline which runs a linear regression to predict taxi fares in NYC. The pipeline is made up of components, each serving different functions, which can be registered with the workspace, versioned, and reused with various inputs and outputs. Sample pipelines and workflows for the Computer Vision and NLP scenarios have different steps and deployment steps.

This training pipeline contains the following steps:

Prepare Data

This component takes multiple taxi datasets (yellow and green) and merges/filters the data, and prepare the train/val and evaluation datasets.
Input: Local data under ./data/ (multiple .csv files)
Output: Single prepared dataset (.csv) and train/val/test datasets.

Train Model

This component trains a Linear Regressor with the training set.
Input: Training dataset
Output: Trained model (pickle format)

Evaluate Model

This component uses the trained model to predict taxi fares on the test set.
Input: ML model and Test dataset
Output: Performance of model and a deploy flag whether to deploy or not.
This component compares the performance of the model with all previous deployed models on the new test dataset and decides whether to promote or not model into production. Promoting model into production happens by registering the model in AML workspace.

Register Model

This component scores the model based on how accurate the predictions are in the test set.
Input: Trained model and the deploy flag.
Output: Registered model in Machine Learning.

Deploying the Model Training Pipeline

Next, you deploy the model training pipeline to your new Machine Learning workspace. This pipeline creates a compute cluster instance, register a training environment defining the necessary Docker image and python packages, register a training dataset, then start the training pipeline described in the last section. When the job is complete, the trained model is registered in the Azure Machine Learning workspace and be available for deployment.

In your GitHub project repository (example: taxi-fare-regression), select Actions
Select the deploy-model-training-pipeline from the workflows listed, and then select Run Workflow to execute the model training workflow. This process takes several minutes to run, depending on the compute size.
Once completed, a successful run registers the model in the Machine Learning workspace.

Note

If you want to check the output of each individual step, for example to view output of a failed run, select a job output, and then select each step in the job to view any output of that step.

With the trained model registered in the Machine learning workspace, you're ready to deploy the model for scoring.

Deploying the Trained Model

This scenario includes prebuilt workflows for two approaches to deploying a trained model, batch scoring or a deploying a model to an endpoint for real-time scoring. You might run either or both of these workflows to test the performance of the model in your Azure Machine Learning workspace.

Online Endpoint

In your GitHub project repository (ex: taxi-fare-regression), select Actions
Select the deploy-online-endpoint-pipeline from the workflows listed on the left and select Run workflow to execute the online endpoint deployment pipeline workflow. The steps in this pipeline create an online endpoint in your Machine Learning workspace, create a deployment of your model to this endpoint, then allocate traffic to the endpoint.

Once completed, the online endpoint is deployed in the Azure Machine Learning workspace and available for testing.
To test this deployment, go to the Endpoints tab in your Machine Learning workspace, select the endpoint and select the Test Tab. You can use the sample input data located in the cloned repo at /data/taxi-request.json to test the endpoint.

Batch Endpoint

In your GitHub project repository (ex: taxi-fare-regression), select Actions
Select the deploy-batch-endpoint-pipeline from the workflows and select Run workflow to execute the batch endpoint deployment pipeline workflow. The steps in this pipeline create a new AmlCompute cluster on which to execute batch scoring, create the batch endpoint in your Machine Learning workspace, then create a deployment of your model to this endpoint.
Once completed, the batch endpoint is deployed in the Azure Machine Learning workspace and available for testing.

Moving to production

Example scenarios can be trained and deployed both for Dev and Prod branches and environments. When you're satisfied with the performance of the model training pipeline, model, and deployment in Testing, Dev pipelines and models can be replicated and deployed in the Production environment.

The sample training and deployment Machine Learning pipelines and GitHub workflows can be used as a starting point to adapt your own modeling code and data.

Clean up resources

If you're not going to continue to use your pipeline, delete your Azure DevOps project.
In Azure portal, delete your resource group and Machine Learning instance.

Next steps

Install and set up Python SDK v2
Install and set up Python CLI v2
Azure MLOps (v2) solution accelerator on GitHub
Training course on MLOps with Machine Learning
Learn more about Azure Pipelines with Machine Learning
Learn more about GitHub Actions with Machine Learning
Deploy MLOps on Azure in Less Than an Hour - Community MLOps V2 Accelerator video

Share via

Set up MLOps with GitHub

Prerequisites

Set up authentication with Azure and GitHub Actions

Create service principal

Set up GitHub repo

Deploy machine learning project infrastructure with GitHub Actions

Configure Machine Learning environment parameters

Deploy Machine Learning infrastructure

Sample Training and Deployment Scenario

Deploying the Model Training Pipeline

Deploying the Trained Model

Online Endpoint

Batch Endpoint

Moving to production

Clean up resources

Next steps

Feedback

Additional resources