Clustering on your Azure Stack Edge Pro GPU device

APPLIES TO: Yes for Pro GPU SKUAzure Stack Edge Pro - GPUYes for Pro 2 SKUAzure Stack Edge Pro 2  

This article provides a brief overview of clustering on your Azure Stack Edge device.

About failover clustering

Azure Stack Edge can be set up as a single standalone device or a two-node cluster. A two-node cluster consists of two independent Azure Stack Edge devices that are connected by physical cables and by software. These nodes when clustered work together as in a Windows failover cluster, provide high availability for applications and services that are running on the cluster.

If one of the clustered nodes fails, the other node begins to provide service (the process is known as failover). The clustered roles are also proactively monitored to make sure that they’re working properly. If they aren’t working, they’re restarted or moved to the second node.

Azure Stack Edge uses Windows Server Failover Clustering for its two-node cluster. For more information, see Failover clustering in Windows Server.

Cluster quorum and witness

A quorum is always maintained on your Azure Stack Edge cluster to remain online in the event of a failure. If one of the nodes fails, then the majority of the surviving nodes must verify that the cluster remains online. The concept of majority only exists for clusters with an odd number of nodes. For more information on cluster quorum, see Understand quorum.

For an Azure Stack Edge cluster with two nodes, if a node fails, then a cluster witness provides the third vote so that the cluster stays online (since the cluster is left with two out of three votes - a majority). A cluster witness is required on your Azure Stack Edge cluster. You can set up the witness in the cloud or in a local fileshare using the local UI of your device.

Infrastructure cluster

The infrastructure cluster on your device provides persistent storage and is shown in the following diagram:

Infrastructure cluster of Azure Stack Edge

  • The infrastructure cluster consists of the two independent nodes running Windows Server operating system with a Hyper-V layer. The nodes contain physical disks for storage and network interfaces that are connected back-to-back or with switches.

  • The disks across the two nodes are used to create a logical storage pool. The storage spaces direct on this pool provides mirroring and parity for the cluster.

  • You can deploy your application workloads on top of the infrastructure cluster.

    • Non-containerized workloads such as VMs can be directly deployed on top of the infrastructure cluster.

      VMs workloads deployed on infrastructure cluster of Azure Stack Edge

    • Containerized workloads use Kubernetes for workload deployment and management. A Kubernetes cluster that consists of a master VM and two worker VMs (one for each node) is deployed on top of the infrastructure cluster.

    The Kubernetes cluster allows for application orchestration whereas the infrastructure cluster provides persistent storage.

Supported network topologies

Based on the use case and workloads, you can select how the two Azure Stack Edge device nodes will be connected. Network topologies will differ depending on whether you use an Azure Stack Edge Pro GPU device or an Azure Stack Edge Pro 2 device.

At a high level, supported network topologies for each of the device types are described here.

On your Azure Stack Edge Pro GPU device node:

  • Port 2 is used for management traffic.
  • Port 3 and Port 4 are used for storage and cluster traffic. This traffic includes that needed for storage mirroring and Azure Stack Edge cluster heartbeat traffic that is required for the cluster to be online.

The following network topologies are available:

Available network topologies

  • Option 1 - Switchless - Use this option when you don't have high speed switches available in the environment for storage and cluster traffic.

    In this option, Port 3 and Port 4 are connected back-to-back without a switch. These ports are dedicated to storage and Azure Stack Edge cluster traffic and aren't available for workload traffic. Optionally you can also provide IP addresses for these ports.

  • Option 2 - Use switches and NIC teaming - Use this option when you have high speed switches available for use with your device nodes for storage and cluster traffic.

    Each of ports 3 and 4 of the two nodes of your device are connected via an external switch. The Port 3 and Port 4 are teamed on each node and a virtual switch and two virtual NICs are created that allow for port-level redundancy for storage and cluster traffic. These ports can be used for workload traffic as well.

  • Option 3 - Use switches without NIC teaming - Use this option when you need an extra dedicated port for workload traffic and port-level redundancy isn’t required for storage and cluster traffic.

    Port 3 on each node is connected via an external switch. If Port 3 fails, the cluster may go offline. Separate virtual switches are created on Port 3 and Port 4.

For more information, see how to Choose a network topology for your device node.

Cluster deployment

Before you configure clustering on your device, you must cable the devices as per one of the supported network topologies that you intend to configure. To deploy a two-node infrastructure cluster on your Azure Stack Edge devices, follow these high-level steps:

Figure showing the steps in the deployment of a two-node Azure Stack Edge

  1. Order two independent Azure Stack Edge devices. For more information, see Order an Azure Stack Edge device.
  2. Cable each node independently as you would for a single node device. Based on the workloads that you intend to deploy, cross connect the network interfaces on these devices via cables, and with or without switches. For detailed instructions, see Cable your two-node cluster device.
  3. Start cluster creation on the first node. Choose the network topology that conforms to the cabling across the two nodes. The chosen topology would dictate the storage and clustering traffic between the nodes. See detailed steps in Configure network and web proxy on your device.
  4. Prepare the second node. Configure the network on the second node the same way you configured it on the first node. Ensure that port settings match between same port name on each appliance. Get the authentication token on this node.
  5. Use the authentication token from the prepared node and join this node to the first node to form a cluster.
  6. Set up a cloud witness using an Azure Storage account or a local witness on an SMB fileshare.
  7. Assign a virtual IP to provide an endpoint for Azure Consistent Services or when using NFS.
  8. Assign compute or management intents to the virtual switches created on the network interfaces. You may also configure Kubernetes node IPs and Kubernetes service IPs here for the network interface enabled for compute.
  9. Optionally configure web proxy, set up device settings, configure certificates and then finally, activate the device.

For more information, see the two-node device deployment tutorials starting with Get deployment configuration checklist.

Clustering workloads

On your two-node cluster, you can deploy non-containerized workloads or containerized workloads.

  • Non-containerized workloads such as VMs: The two-node cluster will ensure high availability of the virtual machines that are deployed on the device cluster. Live migration of VMs isn’t supported.

  • Containerized workloads such as Kubernetes or IoT Edge: The Kubernetes cluster deployed on top of the device cluster consists of one Kubernetes master VM and two Kubernetes worker VMs. Each Kubernetes node has a worker VM that is pinned to each Azure Stack Edge node. Failover results in the failover of Kubernetes master VM (if needed) and Kubernetes-based rebalancing of pods on the surviving worker VM.

    For more information, see Kubernetes on a clustered Azure Stack Edge device.

Cluster management

You can manage the Azure Stack Edge cluster via the PowerShell interface of the device, or through the local UI. Some typical management tasks are:

Cluster updates

A two-node clustered device upgrade will first apply the device updates followed by the Kubernetes cluster updates. Rolling updates to device nodes ensure minimal downtime of workloads.

When you apply these updates via the Azure portal, you only have to start the process on one node and both the nodes are updated. For step-by-step instructions, see Apply updates to your two-node Azure Stack Edge device.

Billing

If you deploy an Azure Stack Edge two-node cluster, each node is billed separately. For more information, see Pricing page for Azure Stack Edge.

Next steps