Use GitHub Actions with Azure Machine Learning

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

Get started with GitHub Actions to train a model on Azure Machine Learning.

This article teaches you how to create a GitHub Actions workflow that builds and deploys a machine learning model to Azure Machine Learning. You train a scikit-learn linear regression model on the NYC Taxi dataset.

GitHub Actions use a workflow YAML (.yml) file in the /.github/workflows/ path in your repository. This definition contains the various steps and parameters that make up the workflow.

Prerequisites

Before following the steps in this article, make sure you have the following prerequisites:

  • An Azure Machine Learning workspace. If you don't have one, use the steps in the Quickstart: Create workspace resources article to create one.

  • To install the Python SDK v2, use the following command:

    pip install azure-ai-ml azure-identity
    

    To update an existing installation of the SDK to the latest version, use the following command:

    pip install --upgrade azure-ai-ml azure-identity
    

    For more information, see Install the Python SDK v2 for Azure Machine Learning.

  • A GitHub account. If you don't have one, sign up for free.

Step 1: Get the code

Fork the following repo at GitHub:

https://github.com/azure/azureml-examples

Clone your forked repo locally.

git clone https://github.com/YOUR-USERNAME/azureml-examples

Step 2: Authenticate with Azure

You'll need to first define how to authenticate with Azure. The recommended, more secure option is to sign in with OpenID Connect using a Microsoft Entra application or a user-assigned managed identity. If necessary, you can also use sign in with a service principal and secret. This approach is less secure and not recommended.

Generate deployment credentials

To use Azure Login action with OIDC, you need to configure a federated identity credential on a Microsoft Entra application or a user-assigned managed identity.

Option 1: Microsoft Entra application

Option 2: User-assigned managed identity

Create secrets

You need to provide your application's Client ID, Directory (tenant) ID, and Subscription ID to the login action. These values can either be provided directly in the workflow or can be stored in GitHub secrets and referenced in your workflow. Saving the values as GitHub secrets is the more secure option.

  1. In GitHub, go to your repository.

  2. Select Security > Secrets and variables > Actions.

    Screenshot of adding a secret

  3. Select New repository secret.

    Note

    To enhance workflow security in public repositories, use environment secrets instead of repository secrets. If the environment requires approval, a job cannot access environment secrets until one of the required reviewers approves it.

  4. Create secrets for AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_SUBSCRIPTION_ID. Copy these values from your Microsoft Entra application or user-assigned managed identity for your GitHub secrets:

    GitHub secret Microsoft Entra application or user-assigned managed identity
    AZURE_CLIENT_ID Client ID
    AZURE_SUBSCRIPTION_ID Subscription ID
    AZURE_TENANT_ID Directory (tenant) ID

    Note

    For security reasons, we recommend using GitHub Secrets rather than passing values directly to the workflow.

Step 3: Update setup.sh to connect to your Azure Machine Learning workspace

You need to update the CLI setup file variables to match your workspace.

  1. In your forked repository, go to azureml-examples/cli/.

  2. Edit setup.sh and update these variables in the file.

    Variable Description
    GROUP Name of resource group
    LOCATION Location of your workspace (example: eastus2)
    WORKSPACE Name of Azure Machine Learning workspace

Step 4: Update pipeline.yml with your compute cluster name

You use a pipeline.yml file to deploy your Azure Machine Learning pipeline. The pipeline is a machine learning pipeline and not a DevOps pipeline. You only need to make this update if you're using a name other than cpu-cluster for your computer cluster name.

  1. In your forked repository, go to azureml-examples/cli/jobs/pipelines/nyc-taxi/pipeline.yml.
  2. Each time you see compute: azureml:cpu-cluster, update the value of cpu-cluster with your compute cluster name. For example, if your cluster is named my-cluster, your new value would be azureml:my-cluster. There are five updates.

Step 5: Run your GitHub Actions workflow

Your workflow authenticates with Azure, sets up the Azure Machine Learning CLI, and uses the CLI to train a model in Azure Machine Learning.

Your workflow file is made up of a trigger section and jobs:

  • A trigger starts the workflow in the on section. The workflow runs by default on a cron schedule and when a pull request is made from matching branches and paths. Learn more about events that trigger workflows.
  • In the jobs section of the workflow, you checkout code and log into Azure with the Azure login action using OpenID Connect.
  • The jobs section also includes a setup action that installs and sets up the Machine Learning CLI (v2). Once the CLI is installed, the run job action runs your Azure Machine Learning pipeline.yml file to train a model with NYC taxi data.

Enable your workflow

  1. In your forked repository, open .github/workflows/cli-jobs-pipelines-nyc-taxi-pipeline.yml and verify that your workflow looks like this.

    name: cli-jobs-pipelines-nyc-taxi-pipeline
    on:
      workflow_dispatch:
      schedule:
        - cron: "0 0/4 * * *"
      pull_request:
        branches:
          - main
          - sdk-preview
        paths:
          - cli/jobs/pipelines/nyc-taxi/**
          - .github/workflows/cli-jobs-pipelines-nyc-taxi-pipeline.yml
          - cli/run-pipeline-jobs.sh
          - cli/setup.sh
    jobs:
      build:
        runs-on: ubuntu-latest
        steps:
        - name: check out repo
          uses: actions/checkout@v2
        - name: azure login
          uses: azure/login@v2
          with:
              client-id: ${{ secrets.AZURE_CLIENT_ID }}
              tenant-id: ${{ secrets.AZURE_TENANT_ID }}
              subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
        - name: setup
          run: bash setup.sh
          working-directory: cli
          continue-on-error: true
        - name: run job
          run: bash -x ../../../run-job.sh pipeline.yml
          working-directory: cli/jobs/pipelines/nyc-taxi
    
  2. Select View runs.

  3. Enable workflows by selecting I understand my workflows, go ahead and enable them.

  4. Select the cli-jobs-pipelines-nyc-taxi-pipeline workflow and choose to Enable workflow.

    Screenshot of enable GitHub Actions workflow.

  5. Select Run workflow and choose the option to Run workflow now.

    Screenshot of run GitHub Actions workflow.

Step 6: Verify your workflow run

  1. Open your completed workflow run and verify that the build job ran successfully. You see a green checkmark next to the job.

  2. Open Azure Machine Learning studio and navigate to the nyc-taxi-pipeline-example. Verify that each part of your job (prep, transform, train, predict, score) completed and that you see a green checkmark.

    Screenshot of successful Machine Learning Studio run.

Clean up resources

When your resource group and repository are no longer needed, clean up the resources you deployed by deleting the resource group and your GitHub repository.

Next steps