Automating ML model deployment to production environments

In general, there are 2 paths to deploying ML code into Production environments:

  • Implement CI/CD pipelines on your development branch to deploy QA and Production environments
  • Use several branches to move code from development to QA and Production environments

Any of the deployment options can fit the proposed deployment workflow.

CI/CD on the development branch

This approach is the most common, where we prepare and store the required artifacts (models and libraries) in advance. To indicate their readiness for production, we assign a special tag to the approved artifacts.

For example, if we execute training from our development branch on the full dataset and we are generating a model, the model should be evaluated and approved. Once the model is ready, we can label it with a special attribute, like a "production tag." Then, during the continuous deployment (CD) phase, we use the latest model version with the production tag for deployment. We don't care much about our code in the development branch because we are not planning to use it in production. We only use it to do training in our development environment.

Rollback strategy can be implemented with no challenges: we just need to remove tag from the latest model and execute the CI/CD again to pick up the previous version that was ready for production.

Stages

In above implementation, it's OK to use just one branch (main, for example) as the primary source for development and executing CI/CD.

The following image demonstrates the final version of the CI/CD process:

CI/CD

An example implementation of a CI pipeline for training in Azure ML can be found here in this basic template. This example pipeline executes several steps when a PR is merged into the development branch:

  1. The build_validation_pipeline.yml pipeline executes, running unit tests and code validation (such as flake8).
  2. The second stage executes several steps to kick off an Azure ML Pipeline
    1. configure_azureml_agent.yml installs pip requirements and uses the Azure CLI extension to install the Azure ML CLI
    2. connect_to_workspace.yml uses pipeline variables to connect to the correct Azure ML workspace
    3. create_compute.yml ensures that there is available Azure ML compute resources to execute the training pipeline
    4. execute_no_wait_job.yml uses the Azure ML CLI to deploy and trigger the Azure ML pipeline defined by the amlJobExecutionScript variable. In this case, it is ./mlops/nyc-taxi/pipeline.yml.
      • This step does not wait for the Azure ML pipeline to complete, as long running pipelines would hold the DevOps build agent. However, the execute_and_wait_job.yml step is available instead for scenarios where training may be quick, and quickly identifying failure is critical. In the pr_to_dev_pipeline.yml, the wait job is used for this reason

The full template repo is available in the following repo.

There are more templates based on a similar pattern available at:

Multi-branch strategy

You can use several branches to move your code from development to production. This strategy works fine if you deploy the code itself, and don’t update the production environment often. Also you have several versions of the production environment for different partners or departments.

We cannot treat the development branch as a production ready branch. The branch is stable but moving it to production "as is" might cause some issues. For example, we could have updated training pipelines and no model exists based on it yet. Another example, somebody could activate a wrong model to be used in scoring service that is not critical for development branch, but critical for production.

Hence, we propose having a flow moving development branch to production using two stages:

  • Once we, data scientists, believe that the current version of the development branch is ready for production, we move code alongside the current model to a special QA branch. This branch is going to be used for testing to validate that our code and the models work fine. Additionally, we can test our deployment scripts there.
  • Once all tests on the QA branch have been completed successfully, we can move code and the models from the QA branch to the production one.

We use branches rather than tags because it allows us to execute some more DevOps pipelines prior to commit code from one branch to another. In these pipelines, we can implement an approval process and move ML pipelines between environments.

The PR Build on the movement from the development to the QA branch should include the following tasks:

  • Deploy needed infrastructure for production. In most cases, you don’t need to deploy your pipelines since you are not doing training in the production environment. So, you can deploy just scoring infrastructure.
  • Copy all latest approved models (approved models can be tagged).

Branches

Implementing this process, we are working across several branches:

  • dev: the development branch as the default stable branch for developers
  • QA: the QA branch to test scoring pipeline and the model
  • main: the source branch for production environment

Once we finish testing for our QA environment, we can create another PR and start a process to move everything to main. The PR should be approved, and it will trigger the deployment Build. The Build has to update scoring infrastructure in the production environment and clone our model again.

Production

Deployment Artifacts

Deployment artifacts in CI/CD process depend on the model usage in the production environment. The most common usage scenarios are batch scoring and runtime scoring:

  • Batch scoring: In this case, we use the model together with another ML pipeline to run it under the Machine Learning Pipeline engine umbrella (like Azure ML) that has been used for training.
  • Runtime scoring: In this case, we need a service to serve our model (e.g., Azure Functions, Azure Kubernetes Service, Azure ML Online Endpoints, locally run Docker container), pack the model to an image alongside with all needed components and make the deployment.

Each scenario defines what components you need in the QA and Production environments.

Batch Scoring

For batch scoring, model registry and compute pipeline resources must be deployed to infrastructure, such as an Azure ML Workspace, or a Databricks instance. These resources will be used to trigger scoring pipelines (which need to be managed as outlined in ML Pipelines. Implementing a CD pipeline for the production environment generally consists of two core steps: copy the latest ready for production model to QA/Prod environment, and publish the scoring service to the appropriate scoring pipeline (for example, Azure ML Pipeline, or Databricks pipeline).

Batch scoring

Alternatively, using the Azure ML registries preview, multiple Workspaces are able to access shared models, components, and environments from a single registry - removing the need to copy the model between each Workspace. These shared registries work even across multiple Azure Subscriptions.

Runtime Scoring

For a runtime service, a replicated model registry service is an optional component in the QA/Prod environment, and it depends on how you are planning to use the model. Depending on the scenario, several methods have been successful:

  • Using the model as a separate entity in any kind of external application: copy the model to a known location in QA/Prod environment to make it available for all consumers. A separate QA/Prod model registry is not needed in this scenario
  • Deploying a model as a part of a custom image to serve in any service that is not connected to your model registry (or Azure ML): create a Docker image and deploy it to the desired service. If you are using Azure ML, you can re-use your Azure Container Registry (ACR) instance in the Dev environment, or a separate container registry.
  • Azure Kubernetes Service (AKS) that is managed by an Azure ML service: replicate Azure ML Workspace and AKS in each environment to follow the best security practices making sure that the AKS instances in each environment are isolated from one another.

Runtime scoring