Cluster aware updating behaviour

Anonymous
2024-11-16T19:54:24+00:00

Cluster aware updating appears to be broken on at least Server 2019 and 2022 and on body has noticed, or is it just me.
CAU should do the following:
As per https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating CAU should:

  1. Puts each node of the cluster into node maintenance mode.
  2. Moves the clustered roles off the node.
  3. Installs the updates and any dependent updates.
  4. Performs a restart if necessary.
  5. Brings the node out of maintenance mode.
  6. Restores the clustered roles on the node.
  7. Moves to update the next node.

This is my expected behaviour and the behaviour of 2012r2, but the behaviour for 2019 and 2022 (not tested 2016 and 2025 yet) is:

  1. Moves CAU role off the node
  2. Installs the updates and any dependent updates.
  3. Puts node of the cluster into node maintenance mode.
  4. Moves the clustered roles off the node.
  5. Performs a restart if necessary.
  6. Brings the node out of maintenance mode.
  7. Restores the clustered roles on the node.
  8. Moves to update the next node.

This has been observed on 2019 failover cluster running SQL Server 2019, causing over 20 minutes of downtime for SQL Server as it updates, it installed SQL Server updates while it was still running on the node. Also on a Windows 2022 cluster running SQL Server 2019. Which the SQL Server logs state that running the update on the node which is the owner node will cause downtime:

Cluster_IsLocalNodeGroupOwner : Checks if the local computer is an owner of an online cluster group for a failover cluster instance that contains the SQL Server service, Analysis Services service or a generic service.
Warning: The local computer is an owner of a cluster group that contain the SQL Server service, Analysis Services service or a generic service for a failover cluster instance. If you continue, all SQL Server instances may be taken offline causing downtime while the patch is being applied.|

This has also been checked via the logs for a server 2022 scale out file server and failover file server, showing that the updates are installed prior to moving the roles off the node.
Looking through the logs over the last year or two, it appears this has always been for these 2019 and 2022 clusters, just now been looked into as it was assumed it was something else.

On a 2012r2 server, it behaves as it should. The roles are moved off before the updates begin.

Is this some how just me. On multiple failover clusters, in multiple domains?

Windows for business | Windows Server | Storage high availability | Clustering and high availability

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question. To protect privacy, user profiles for migrated questions are anonymized.

0 comments No comments
{count} vote

5 answers

Sort by: Most helpful
  1. Anonymous
    2024-11-19T06:50:21+00:00

    Hi,

    I don't find any Microsoft documentation that mentions a change in the Cluster Aware Updating process. I checked the event log on my Windows Server 2022 cluster node and the roles on the node were moved due to node drain as expected before the node enter maintenance mode.

    0 comments No comments
  2. Anonymous
    2024-11-19T08:00:01+00:00

    Before or after updating? It moves them, just not when it supposed to for any of my clusters.

    This is a condensed log output for the first node up until the reboot happens:
    02:00:03 - node1 - - CAU starts

    02:00:06 - node1 - - Windows update starts running

    02:00:06 - node2 - - CAU starts

    02:00:13 - node1 - - Windows updates start downloading

    					Security Update for SQL Server 2019 RTM CU (KB5046860)
    
    					Windows Malicious Software Removal Tool x64 - v5.130 (KB890830)
    
    					2024-11 Cumulative Update for Windows Server 2019 for x64-based Systems (KB5046615)
    
    					2024-11 Cumulative Update for .NET Framework 3.5, 4.7.2 and 4.8 for Windows Server 2019 for x64 (KB5046540)
    

    02:00:59 - node2 - - Windows update starts running

    02:01:44 - node1 - - Windows update starting installing update

    					Security Update for SQL Server 2019 RTM CU (KB5046860)
    

    02:02:44 - node1 - MSSQL1 - CAU offlining SQL Server
    02:02:44 - 02:03:01 - node1 - MSSQL- CEIP, Commvault,SQL Server agent, SQL Server,SQL Server browser service is stopped

    02:03:01 - node1 - MSSQL1 - Failover Cluster reports SQL Server is offline
    02:05:43 - node1 - MSSQL1 - Cluster is attempting to bring online role

    02:05:04 - 02:06:21 - node1 - MSSQL1 - Commvault, SQL Server full text search, SQL Server service started

    02:09:16 - node1 - MSSQL1 - SQL Server cluster role failed

    02:09:16 - node1 - MSSQL1 - Cluster resource 'SQL Server (MSSQL1)' in clustered role 'SQL Server (MSSQL1)' has transitioned from state OnlinePending to state ProcessingFailure.

    02:09:16 - node1 - MSSQL1 - Cluster resource 'SQL Server (MSSQL1)' in clustered role 'SQL Server (MSSQL1)' has transitioned from state WaitingToTerminate to state Terminating.

    02:09:18 - node1 - MSSQL1- SQL Server service terminated unexpectedly. It has done this 1 time(s).

    02:09:19 - node1 - MSSQL1 - Cluster resource 'SQL Server (MSSQL1)' in clustered role 'SQL Server (MSSQL1)' has transitioned from state Terminating to state DelayRestartingResource.

    02:09:20 - node1 - MSSQL1 - Cluster resource 'SQL Server (MSSQL1)' in clustered role 'SQL Server (MSSQL1)' has transitioned from state DelayRestartingResource to state OnlineCallIssued.

    02:09:22 - node1 - MSSQL1 - SQL Server started

    02:10:13 - node1i - MSSQL1 - Cluster resource 'SQL Server (MSSQL1)' in clustered role 'SQL Server (MSSQL1)' has transitioned from state OnlinePending to state Online.

    02:10:13 - node1 - MSSQL1 - SQL Server CEIP service is started

    02:10:13 - node1 - MSSQL1 - SQL Server agent service started

    02:10:14 - node1 - MSSQL1 - The Cluster service successfully brought the clustered role 'SQL Server (MSSQL1)' online.

    02:11:35 - node1 - - Windows successfully installed the following update: Security Update for SQL Server 2019 RTM CU (KB5046860)

    02:11:35 - node1 - - Windows has started installing the following update: Windows Malicious Software Removal Tool x64 - v5.130 (KB890830)

    02:11:53 - node1 - - Windows successfully installed the following update: Windows Malicious Software Removal Tool x64 - v5.130 (KB890830)

    02:11:53 - node1 - - Windows has started installing the following update: 2024-11 Cumulative Update for Windows Server 2019 for x64-based Systems (KB5046615)

    02:18:44 - node1 - - Windows has started installing the following update: 2024-11 Cumulative Update for .NET Framework 3.5, 4.7.2 and 4.8 for Windows Server 2019 for x64 (KB5046540)

    02:19:04 - node2 - - Windows update stops running

    02:19:14 - node1 - MSSQL1 - The Cluster service is attempting to bring the clustered role 'SQL Server (MSSQL1)' offline.

    02:19:21 - node2 - MSSQL1 - Clustered role 'SQL Server (MSSQL1)' is moving from cluster node 'node1' to cluster node 'node2'.

    02:19:22 - node1 - - Node enters maintenance mode

    02:19:22 - node1 - - Node restart starts

    0 comments No comments
  3. Anonymous
    2024-11-20T01:43:39+00:00

    Hi,

    I checked the logs again and yes the updates are installed before the roles are moved. You may need to contact Microsoft directly to confirm that this behavior is by design in Windows Server 2019 and 2022.

    11/19/2024 5:33:48 AM

    Scan for updates succeeded. Found 2 updates

    11/19/2024 5:34:21 AM

    Download for updates succeeded. Downloaded 2 updates

    11/19/2024 5:38:39 AM

    Install for updates succeeded. Installed 2 updates

    11/19/2024 5:38:43 AM

    Clustered role 'Cluster Group' is moving from cluster node 'NODE1' to cluster node 'NODE0'. Move reason string: 'node drain'

    11/19/2024 5:38:47 AM

    Clustered role 'ClusFS0' is moving from cluster node 'NODE1' to cluster node 'NODE0'. Move reason string: 'node drain'

    11/19/2024 5:38:50 AM

    Clustered role 'SOFS1' is moving from cluster node 'NODE1' to cluster node 'NODE0'. Move reason string: 'node drain'

    11/19/2024 5:39:01 AM

    Node NODE1 entered node maintenance mode.

    11/19/2024 5:39:01 AM

    Rebooting node NODE1.

    11/19/2024 5:44:00 AM

    Node NODE1 has rebooted successfully.

    11/19/2024 5:44:11 AM

    Node NODE1 exited node maintenance mode.

    0 comments No comments
  4. Anonymous
    2024-11-20T08:33:51+00:00

    Thanks for the confirmation, from the CAU documentation, it isn't intended behaviour https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-aware-updating

    I will get a case opened with Microsoft.

    Thanks again.

    0 comments No comments
  5. Anonymous
    2024-12-16T10:14:25+00:00

    Well i opened a case, its been going slowly, so far, they are saying as they have been able to reproduce my problem, with a clean install of windows, its the default behaviour, therefore intended behaviour!! Why do we get monthly updates then if default behaviour means its working as intended?
    So they are going to ask someone to update the public documentation. Which to me, if it is intended behaviour, it isnt well designed as it means to ensure no problems for the running services on the server, you would need to modify the default behaviour with pre and post scripts, unlike before this change.

    i have asked them if they have internal documentation that states this is the intended behaviour. This to me is a bug, its not a logical process for updating a node.

    0 comments No comments