Edit

Share via


Clustering on your Azure Stack Edge Pro GPU device

APPLIES TO: Yes for Pro GPU SKUAzure Stack Edge Pro - GPUYes for Pro 2 SKUAzure Stack Edge Pro 2  

This article provides a brief overview of clustering on your Azure Stack Edge device.

About failover clustering

Azure Stack Edge can be set up as a single standalone device or a two-node cluster. A two-node cluster consists of two independent Azure Stack Edge devices that are connected by physical cables and by software. These nodes when clustered work together as in a Windows failover cluster, provide high availability for applications and services that are running on the cluster.

If one of the clustered nodes fails, the other node begins to provide service (the process is known as failover). The clustered roles are also proactively monitored to make sure that they’re working properly. If they aren’t working, they’re restarted or moved to the second node.

Azure Stack Edge uses Windows Server Failover Clustering for its two-node cluster. For more information, see Failover clustering in Windows Server.

Cluster quorum and witness

A quorum is always maintained on your Azure Stack Edge cluster to remain online in the event of a failure. If one of the nodes fails, then the majority of the surviving nodes must verify that the cluster remains online. The concept of majority only exists for clusters with an odd number of nodes. For more information on cluster quorum, see Understand quorum.

For an Azure Stack Edge cluster with two nodes, if a node fails, then a cluster witness provides the third vote so that the cluster stays online (since the cluster is left with two out of three votes - a majority). A cluster witness is required on your Azure Stack Edge cluster. You can set up the witness in the cloud or in a local fileshare using the local UI of your device.

Infrastructure cluster

The infrastructure cluster on your device provides persistent storage and is shown in the following diagram:

Infrastructure cluster of Azure Stack Edge

  • The infrastructure cluster consists of the two independent nodes running Windows Server operating system with a Hyper-V layer. The nodes contain physical disks for storage and network interfaces that are connected back-to-back or with switches.

  • The disks across the two nodes are used to create a logical storage pool. The storage spaces direct on this pool provides mirroring and parity for the cluster.

  • You can deploy your application workloads on top of the infrastructure cluster.

    • Non-containerized workloads such as VMs can be directly deployed on top of the infrastructure cluster.

      VMs workloads deployed on infrastructure cluster of Azure Stack Edge

    • Containerized workloads use Kubernetes for workload deployment and management. A Kubernetes cluster that consists of a master VM and two worker VMs (one for each node) is deployed on top of the infrastructure cluster.

    The Kubernetes cluster allows for application orchestration whereas the infrastructure cluster provides persistent storage.

Supported network topologies

Based on the use case and workloads, you can select how the two Azure Stack Edge device nodes will be connected. Network topologies will differ depending on whether you use an Azure Stack Edge Pro GPU device or an Azure Stack Edge Pro 2 device.

At a high level, supported network topologies for each of the device types are described here.

On your Azure Stack Edge Pro 2 device node:

  • Option 1 - Port 1 and Port 2 are in different subnets. Separate virtual switches are created. Port 3 and Port 4 connect to an external virtual switch.

  • Option 2 - Port 1 and Port 2 are in the same subnet. A teamed virtual switch is created. Port 3 and Port 4 connect to an external virtual switch.

  • Option 3 - Port 1 and Port 2 are in separate subnets. Separate virtual switches are created on Port 1 and Port 2. Port 3 and Port 4 are connected back-to-back, switchless for Port 3 and Port 4.

  • Option 4 - Port 1 and Port 2 are in the same subnet. A teamed virtual switch is created. Port 3 and Port 4 are connected back-to-back, switchless for Port 3 and Port 4.

    Note

    If you run PMEC workloads, use Option 1 or Option 2.

Usage considerations on your Azure Stack Edge Pro 2 device nodes:

  • Switchless for Port 3 and Port 4 - Use this option when you don't have high speed switches available in the environment, or you want to dedicate Port 3 and Port 4 for storage and cluster traffic.
    • Port 1 and Port 2 in separate subnets - This is the default option. In this case, Port 1 and Port 2 have separate virtual switches and are connected to separate subnets.
    • Port 1 and Port 2 in the same subnet - In this case, Port 1 and Port 2 have a teamed virtual switch and both ports are in the same subnet.
  • Using external switches for Port 3 and Port 4 - Use this option when you have high speed switches (>=10 GbE bandwidth) available for use with your device nodes and you want to allow a VM network adapter to connect to the virtual network created on Port 3 or Port 4, like a PMEC use case.
    • Port 1 and Port 2 in separate subnets - This is the default option. In this case, Port 1 and Port 2 have separate virtual switches and are connected to separate subnets.
    • Port 1 and Port 2 in the same subnet - In this case, Port 1 and Port 2 have a teamed virtual switch and both ports are in the same subnet.

Additional considerations:

  • Port 1 is used for initial configuration. Port 1 is then reconfigured and assigned an IP address that may or may not be in the same subnet as Port 2.
  • If you select the Using external switches option, Port 1 and Port 2 are used for storage in both teaming and non-teaming modes.
  • When using the Switchless option, Port 3 and Port 4 are connected back-to-back directly, without a switch. These ports are dedicated to storage and Azure Stack Edge cluster traffic. Port 3 and Port 4 aren't available for workload traffic.

Pros and cons for supported topologies are summarized as follows:

Local web UI option Advantages Disadvantages
Port 3 and Port 4 are Switchless, Port 1 and Port 2 in separate subnet, separate virtual switches. Redundant paths for management and storage traffic. Clients must reconnect if Port 1 or Port 2 fails.
No single point of failure within the device. VM workload can't leverage Port 3 or Port 4 to connect to network endpoints other than a peer Azure Stack Edge node. This is why PMEC workloads can't use this option.
Lots of bandwidth for storage and cluster traffic across nodes.
Can be deployed with Port 1 and Port 2 in different subnets.
Port 3 and Port 4 are Switchless, Port 1 and Port 2 are in the same subnet, teamed virtual switch. Redundant paths for management and storage traffic. VM workload can't leverage Port 3 or Port 4 to connect to network endpoints other than a peer Azure Stack Edge node. This is why PMEC workloads can't use this option.
Lots of bandwidth for storage and cluster traffic across nodes.
Higher fault tolerance.
Port 3 and Port 4 use an external switch with >=10Gbps link bandwidth, Port 1 and Port 2 in separate subnets, separate virtual switches Two independent virtual switches and network paths provide redundancy. Clients must reconnect if Port 1 or Port 2 fails.
No single point of failure with the device.
Port 1 and Port 2 can be connected to different subnets.
Port 3 and Port 4 use an external switch with >=10Gbps link bandwidth, Port 1 and Port 2 in the same subnet, teamed virtual switch. Load balancing.
Higher fault tolerance. Can't be deployed in an environment with different subnets.
Two independent, redundant paths between nodes.
Clients don't need to reconnect.

Cluster deployment

Before you configure clustering on your device, you must cable the devices as per one of the supported network topologies that you intend to configure. To deploy a two-node infrastructure cluster on your Azure Stack Edge devices, follow these high-level steps:

Figure showing the steps in the deployment of a two-node Azure Stack Edge

  1. Order two independent Azure Stack Edge devices. For more information, see Order an Azure Stack Edge device.
  2. Cable each node independently as you would for a single node device. Based on the workloads that you intend to deploy, cross connect the network interfaces on these devices via cables, and with or without switches. For detailed instructions, see Cable your two-node cluster device.
  3. Start cluster creation on the first node. Choose the network topology that conforms to the cabling across the two nodes. The chosen topology would dictate the storage and clustering traffic between the nodes. See detailed steps in Configure network and web proxy on your device.
  4. Prepare the second node. Configure the network on the second node the same way you configured it on the first node. Ensure that port settings match between same port name on each appliance. Get the authentication token on this node.
  5. Use the authentication token from the prepared node and join this node to the first node to form a cluster.
  6. Set up a cloud witness using an Azure Storage account or a local witness on an SMB fileshare.
  7. Assign a virtual IP to provide an endpoint for Azure Consistent Services or when using NFS.
  8. Assign compute or management intents to the virtual switches created on the network interfaces. You may also configure Kubernetes node IPs and Kubernetes service IPs here for the network interface enabled for compute.
  9. Optionally configure web proxy, set up device settings, configure certificates and then finally, activate the device.

For more information, see the two-node device deployment tutorials starting with Get deployment configuration checklist.

Clustering workloads

On your two-node cluster, you can deploy non-containerized workloads or containerized workloads.

  • Non-containerized workloads such as VMs: The two-node cluster will ensure high availability of the virtual machines that are deployed on the device cluster. Live migration of VMs isn’t supported.

  • Containerized workloads such as Kubernetes or IoT Edge: The Kubernetes cluster deployed on top of the device cluster consists of one Kubernetes master VM and two Kubernetes worker VMs. Each Kubernetes node has a worker VM that is pinned to each Azure Stack Edge node. Failover results in the failover of Kubernetes master VM (if needed) and Kubernetes-based rebalancing of pods on the surviving worker VM.

    For more information, see Kubernetes on a clustered Azure Stack Edge device.

Cluster management

You can manage the Azure Stack Edge cluster via the PowerShell interface of the device, or through the local UI. Some typical management tasks are:

Cluster updates

A two-node clustered device upgrade will first apply the device updates followed by the Kubernetes cluster updates. Rolling updates to device nodes ensure minimal downtime of workloads.

When you apply these updates via the Azure portal, you only have to start the process on one node and both the nodes are updated. For step-by-step instructions, see Apply updates to your two-node Azure Stack Edge device.

Billing

If you deploy an Azure Stack Edge two-node cluster, each node is billed separately. For more information, see Pricing page for Azure Stack Edge.

Next steps