Orchestrate updates across multiple clusters by using Azure Kubernetes Fleet Manager

Platform admins managing Kubernetes fleets with large number of clusters often have problems with staging their updates in a safe and predictable way across multiple clusters. To address this pain point, Kubernetes Fleet Manager (Fleet) allows you to orchestrate updates across multiple clusters using update runs, stages, groups, and strategies.

Screenshot of the Azure portal pane for a fleet resource, showing member cluster Kubernetes versions and node images in use across all node pools of member clusters.

Prerequisites

  • Read the conceptual overview of this feature, which provides an explanation of update strategies, runs, stages, and groups references in this document.

  • You must have a fleet resource with one or more member clusters. If not, follow the quickstart to create a Fleet resource and join Azure Kubernetes Service (AKS) clusters as members. This walkthrough demonstrates a fleet resource with five AKS member clusters as an example.

  • Set the following environment variables:

    export GROUP=<resource-group>
    export FLEET=<fleet-name>
    
  • If you're following the Azure CLI instructions in this article, you need Azure CLI version 2.53.1 or later installed. To install or upgrade, see Install the Azure CLI.

  • You also need the fleet Azure CLI extension, which you can install by running the following command:

    az extension add --name fleet
    

    Run the following command to update to the latest version of the extension released:

    az extension update --name fleet
    

Note

Update runs honor planned maintenance windows that you set at the AKS cluster level. For more information, see planned maintenance across multiple member clusters which explains how update runs handle member clusters that have been configured with planned maintenance windows.

Update run supports two options for the sequence in which the clusters are upgraded:

  • One-by-one: If you don't care about controlling the sequence in which the clusters are upgraded, one-by-one provides a simple approach to upgrade all member clusters of the fleet in sequence one-by-one
  • Control sequence of clusters using update groups and stages - If you want to control the sequence in which the clusters are upgraded, you can structure member clusters in update groups and update stages. Further, this sequence can be stored as a template in the form of update strategy. Update runs can later be created from update strategies instead of defining the sequence every time one needs to create an update run based on stages.

Update all clusters one by one

  1. On the page for your Azure Kubernetes Fleet Manager resource, go to the Multi-cluster update menu and select Create.

  2. Choosing One by one upgrades all member clusters of the fleet in sequence one-by-one.

    Screenshot of the Azure portal pane for creating update runs that update clusters one by one in Azure Kubernetes Fleet Manager.

  3. For upgrade scope, you can choose one of these three options:

    • Kubernetes version for both control plane and node pools
    • Kubernetes version for only control plane of the cluster
    • Node image version only

    Screenshot of the Azure portal pane for creating update runs. The upgrade scope section is shown.

    For the node image, the following options are available:

    • Latest: Updates every AKS cluster in the update run to the latest image available for that cluster in its region.
    • Consistent: As it's possible for an update run to have AKS clusters across multiple regions where the latest available node images can be different (check release tracker for more information). The update run picks the latest common image across all these regions to achieve consistency.

Update clusters in a specific order

Update groups and stages provide more control over the sequence that update runs follow when you're updating the clusters. Within an update stage, updates are applied to all the different update groups in parallel; within an update group, member clusters update sequentially.

Assign a cluster to an update group

You can assign a member cluster to a specific update group in one of two ways.

  • Assign to group when adding member cluster to the fleet. For example:
  1. On the page for your Azure Kubernetes Fleet Manager resource, go to Member clusters.

    Screenshot of the Azure portal page for Azure Kubernetes Fleet Manager member clusters.

  2. Specify the update group that the member cluster should belong to.

    Screenshot of the Azure portal page for adding member clusters to Azure Kubernetes Fleet Manager and assigning them to groups.

  • The second method is to assign an existing fleet member to an update group. For example:
  1. On the page for your Azure Kubernetes Fleet Manager resource, navigate to Member clusters. Choose the member clusters that you want, and then select Assign update group.

    Screenshot of the Azure portal page for assigning existing member clusters to a group.

  2. Specify the group name, and then select Assign.

    Screenshot of the Azure portal page for member clusters that shows the form for updating a member cluster's group.

Note

Any fleet member can only be a part of one update group, but an update group can have multiple fleet members inside it. An update group itself is not a separate resource type. Update groups are only strings representing references from the fleet members. So, if all fleet members with references to a common update group are deleted, that specific update group will cease to exist as well.

Define an update run and stages

You can define an update run using update stages in order to sequentially order the application of updates to different update groups. For example, a first update stage might update test environment member clusters, and a second update stage would then update production environment member clusters. You can also specify a wait time between the update stages.

  1. On the page for your Azure Kubernetes Fleet Manager resource, navigate to Multi-cluster update. Under the Runs tab, select Create.

  2. Provide a name for your update run and then select 'Stages' for update sequence type.

    Screenshot of the Azure portal page for choosing stages mode within update run.

  3. Choose Create Stage. You can now specify the stage name and the duration to wait after each stage.

    Screenshot of the Azure portal page for creating a stage and defining wait time.

  4. Choose the update groups that you want to include in this stage.

    Screenshot of the Azure portal page for stage creation that shows the selection of upgrade groups.

  5. After you define all your stages, you can order them by using the Move up and Move down controls.

  6. For upgrade scope, you can choose one of these three options:

    • Kubernetes version for both control plane and node pools
    • Kubernetes version for only control plane of the cluster
    • Node image version only

    Screenshot of the Azure portal pane for creating update runs. The upgrade scope section is shown.

    For the node image, the following options are available:

    • Latest: Updates every AKS cluster in the update run to the latest image available for that cluster in its region.
    • Consistent: As it's possible for an update run to have AKS clusters across multiple regions where the latest available node images can be different (check release tracker for more information). The update run picks the latest common image across all these regions to achieve consistency.
  7. Click on Create at the bottom of the page to create the update run. Specifying stages and their order every time when creating an update run can get repetitive and cumbersome. Update strategies simplify this process by allowing you to store templates for update runs. For more information, see update strategy creation and usage.

  8. In the Multi-cluster update menu, choose the update run and select Start.

Create an update run using update strategies

Creating an update run required the stages, groups, and their order to be specified each time. Update strategies simplify this process by allowing you to store templates for update runs.

Note

It is possible to create multiple update runs with unique names from the same update strategy.

Create an update strategy: There are two ways to create an update strategy:

  • Approach 1: You can save an update strategy while creating an update run.

    A screenshot of the Azure portal showing update run stages being saved as an update strategy.

  • Approach 2: You can navigate to Multi-cluster update and choose Create under the Strategy tab.

    A screenshot of the Azure portal showing creation of update strategy.

Use an update strategy to create update run: The update strategy you created can later be referenced when creating new subsequent update runs:

A screenshot of the Azure portal showing the creation of a new update run. The 'Copy from existing strategy' button is highlighted.

Manage an Update run

There are a few options to manage update runs:

  • Under Multi-cluster update tab of the fleet resource, you can Start an update run that is either in Not started or Failed state.

    A screenshot of the Azure portal showing how to start an update run in the 'Not started' state.

  • Under Multi-cluster update tab of the fleet resource, you can Stop a currently Running update run.

    A screenshot of the Azure portal showing how to stop an update run in the 'Running' state.

  • Within any update run in Not Started, Failed, or Running state, you can select any Stage and Skip the upgrade.

    A screenshot of the Azure portal showing how to skip upgrade for a specific stage in an update run.

    You can similarly skip the upgrade at the update group or member cluster level too.

    For more information, see conceptual overview on the update run states and skip behavior on runs/stages/groups.