Rolling Upgrade 2 Node Cluster Server 2012R2 to 2016

Paul Loveitt 0 Reputation points
2024-06-09T19:20:47.68+00:00

Working on rolling upgrade of Failover cluster from 2012R2 to 2016. Have followed the guide provided by Microsoft, https://learn.microsoft.com/en-us/windows-server/failover-clustering/cluster-operating-system-rolling-upgrade but cannot for the life of me get the 2016 Node to join back to the cluster (in Mixed mode). We will be upgrading the other node as soon as we have this one back in the cluster so is only temporary

VM's migrated, drained roles, removed cluster installed 2016. Setup up Virtual switches and Network settings, installed Failover Feature, ran validation, but when trying to connect to cluster either by FCM or powershell I get an error. 'The Cluster to which you are attemting to connect is not a version of the cluster supported by this version of Failover cluster manager'

Have done so much diags and testing but nothing seems to come up trumps. Servers are both fully patched to date.

A couple of the errors I have managed to grab along the way:

From the working Node (node1) in FCM, whilst the Upgraded Node (node2) was attemting to join, I grabbed this error, as the node appears in FCM briefly:

Event 1653: SERVICE_NO_CONNECTIVITY

Cluster node '%1' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls.

Firewalls are both off while testing/upgrade/diagnoses taking place.

The error I think casuing the issue, Cluster logs from Node1:

00000e5c.00001014::2024/06/09-15:19:56.085 INFO [ACCEPT] 0.0.0.0:~3343~: Accepted inbound connection from remote endpoint 192.168.0.246:~49972~.

00000e5c.00005450::2024/06/09-15:19:56.085 INFO [SV] New real route: local (192.168.0.242:~3343~) to remote (192.168.0.246:~49972~).

00000e5c.00005450::2024/06/09-15:19:56.085 INFO [SV] Got a new incoming stream from 192.168.0.246:~49972~

00000e5c.00005450::2024/06/09-15:19:56.099 WARN mscs::ListenerWorker::operator (): HrError(0x8009030c)' because of '[SV] Authentication or Authorization Failed'

The Corresponding Error from Node2:

0000252c.00002678::2024/06/09-12:48:48.279 INFO [NODE] Node 2: New join with n1: stage: 'Authenticate Initial Connection'

0000252c.00002678::2024/06/09-12:48:48.279 INFO [NODE] Node 2: New join with n1: stage: 'Authorize Initial Connection'

0000252c.00002678::2024/06/09-12:48:48.279 INFO [SV] Authentication and authorization were successful

0000252c.00002678::2024/06/09-12:48:48.279 DBG [SM] Joiner: Initialized with SPN = PEN-Node1, RequiredCtxAttrib = 1, HandShakeTimeout = 40000

0000252c.00001648::2024/06/09-12:48:48.280 DBG [SM] Handling auth handshake posted by thread id 9848

0000252c.00001648::2024/06/09-12:48:48.280 DBG [SM] Joiner: Versions: 1-9

0000252c.00001648::2024/06/09-12:48:48.280 DBG [SM] Joiner: ISC returned status = 590610 output Blob size 1820, service principal name HOST/PEN-Node1, auth type MSG_AUTH_PACKAGE::KerberosAuth, attr: 67586

0000252c.00001648::2024/06/09-12:48:48.280 DBG [SM] Joiner: Sending SSPI blob of size 1820 to Sponsor

0000252c.00001648::2024/06/09-12:48:48.289 ERR mscs_security::BaseSecurityContext::DoAuthenticate_static: (30)' because of '[Schannel] Received wrong header info: 1340223444, 2777868933, 48000'

0000252c.00002678::2024/06/09-12:48:48.289 INFO [NODE] Node 2: New join with n1: stage: 'Establish Kernel-Mode Security Context' status HrError(0x0000001e) reason: '[SV] Security Handshake failed to obtain SecurityContext for NetFT driver'

0000252c.00002678::2024/06/09-12:48:48.289 DBG [CHANNEL 192.168.0.242:~3343~] Close().

0000252c.00002678::2024/06/09-12:48:48.290 WARN cxl::ConnectWorker::operator (): HrError(0x0000001e)' because of '[SV] Security Handshake failed to obtain SecurityContext for NetFT driver'

NB: times may not match as I have a few versions of the log files, but its the same errors.

Found a thread reporting TLS, could be the issue but see no reference for TLS in either node Registry

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\

I dropped the Domain Function Level to Server2012R2 (as it was 2016 before, from the 2016 DC)

I can see the RSA key is created during the attempt to join the cluster and had permissions matching Node1.

Have got as far as using ProcMon to capture what is happening, but not sure where to start with this!

Windows Server 2016
Windows Server 2016
A Microsoft server operating system that supports enterprise-level management updated to data storage.
2,418 questions
Windows Server 2012
Windows Server 2012
A Microsoft server operating system that supports enterprise-level management, data storage, applications, and communications.
1,558 questions
0 comments No comments
{count} votes