No such thing as a Heartbeat Network
This is a blog that has been a long time coming. From time to time, we get a request about how to configure networking in Failover Clusters. One of the questions we get is how should the heartbeat network be configured and that is what the focus is on this blog. I am here to say, there is no such thing, and never was, a heartbeat network.
Please allow me to give a little background and explain.
In Windows 2003 and below Failover Clustering, you could define which network was used for Cluster Communication. Below is a picture for reference.
In the picture above, we would want to select Private for our Cluster Communication to as to not use the Public which has all WAN traffic. All Cluster Communication between nodes (joins, registry updates/changes, etc) would go only over this network if it is up. As the picture shows, the networks are called Public and Private. As years went by, some started calling the Private network a Heartbeat network.
Heartbeats are small packets (134 bytes) that travel over UDP Port 3343 on all networks configured for Cluster use between all nodes. They serve multiple purposes.
1. Establishes if a Cluster Network is up or down
2. Establishes routes between nodes
3. Ensure the health of a node is good or bad
So let's say I have Private set as my priority network for Cluster Communications. If it is up, we are sending our communication through it. But what happens if that network wasn't reliable. If a node tries to join and packets are dropping, then the join could fail. If this is the case, you either determine where the problem is and fix it, or go back into the Cluster properties and set the Public as priority.
Starting in Windows 2008 Failover Clusters, the concept of Public and Private networks went out the window. We will now send Cluster Communication over any of our networks. One of the reasons for this was reliability. With that change, we also gave the heartbeats an additional purpose.
4. Determine the fastest and reliable routes between nodes
Since we are now determining the fastest and reliable routes, we could use different networks between nodes for our communication. Take the below as an example.
We have three individual networks between our nodes:
- Blue – 10 gbps used for backups and administration only
- Green – 40 gbps used for communicating out on the WAN to clients
- Red – 40 gbps used for communicating out on the WAN to clients
As a refresher, here is what the heartbeats are doing:
1. Establishes if a Cluster Network is up or down
2. Establishes routes between nodes
3. Ensure the health of a node is good or bad
4. Determine the fastest and reliable routes between nodes
What the heartbeats are going to tell the Cluster is to use one of the faster networks for its communication. With that as the case, it is going to use either Red or Green network. If the heartbeats start detecting that neither of these is as reliable (i.e. dropping a packet, network congested, etc), it will automatically switch and use the Blue network. That's it, nothing for you to configure extra.
So to wrap things up, remember these things about Failover Clusters and Heartbeats.
1. There is no such thing as a heartbeat network or a network dedicated to heartbeats
2. Heartbeat packets are lightweight (134 bytes in size)
3. Heartbeats are sensitive to latency
4. Bandwidth is not an important factor, quality of service is. If your network is all teamed, ensure you have set up Network QOS policies for our UDP 3343 traffic.
For more information regarding configuring networks in a Cluster, please see the following:
Failover Cluster Networking Essentials
Happy Clustering !!!!
John Marlin