Share via

SQL Server failover issue from active node to passive node

Anonymous
2024-04-15T06:42:58+00:00

Hi There,

I am facing an issue of SQL cluster failover from active node to passive node running on Windows server 2019 with SQL server 2019. While failing over the resources from active node to passive node, sql server agent is not coming online on passive node. please find the below error log from cluste.log, can someone please assist me to resolve the issue.

Line 2661793: 00000c44.00003630::2024/03/29-01:31:17.448 INFO [RCM-plcmt] Group SQL Server (SMEAZT2WDB01) allowed to move to node 2 by filter PreferredOwnerWaitFilter

Line 2661852: 00000c44.000036e4::2024/03/29-01:31:18.488 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2661855: 00000c44.000036e4::2024/03/29-01:31:18.489 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2661887: 00000c44.000026d0::2024/03/29-01:31:19.462 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2661890: 00000c44.000026d0::2024/03/29-01:31:19.462 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2661916: 00000c44.00003620::2024/03/29-01:31:19.570 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2661919: 00000c44.00003620::2024/03/29-01:31:19.570 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2661944: 00002888.00004964::2024/03/29-01:31:19.571 INFO  [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] SQL Server resource state is changed from 'ClusterResourceOnline' to 'ClusterResourceOfflinePending' 

Line 2661953: 00002888.0000333c::2024/03/29-01:31:19.714 ERR   [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] ODBC Error: [HY008] [Microsoft][SQL Server Native Client 11.0]Operation canceled (0) 

Line 2661954: 00002888.0000333c::2024/03/29-01:31:19.714 ERR   [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] ODBC Error: [01000] [Microsoft][SQL Server Native Client 11.0][SQL Server]  (0) 

Line 2661962: 0000187c.00000824::2024/03/29-01:31:23.159 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2661966: 00002888.00001d9c::2024/03/29-01:31:23.588 INFO  [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] SQL Server resource state is changed from 'ClusterResourceOfflinePending' to 'ClusterResourceOffline' 

Line 2661972: 00000c44.00004f3c::2024/03/29-01:31:23.588 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2661975: 00000c44.00004f3c::2024/03/29-01:31:23.588 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2662080: 00000c44.000026d0::2024/03/29-01:31:23.616 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2662083: 00000c44.000026d0::2024/03/29-01:31:23.616 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2662232: 00002528.00001488::2024/03/29-01:31:23.812 WARN  [RES] IP Address <SQL IP Address 1 (SMEAZAESQLCLU01)>: ListenerThread : Failed to accept incoming connection with error 10004. 

Line 2662238: 00000c44.0000271c::2024/03/29-01:31:23.813 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2662241: 00000c44.0000271c::2024/03/29-01:31:23.813 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2662264: 00000c44.00002560::2024/03/29-01:31:23.814 ERR   [RCM] s\_RcmRpcGetResourceState: (5908)' because of ''SQL IP Address 1 (SMEAZAESQLCLU01)' is owned by node 2, not 1.' 

Line 2662265: 00000c44.00003620::2024/03/29-01:31:23.814 WARN  [RCM] rcm::ChaseTheOwnerLoop::NoLockIsCallComplete: forwarded, no retry on error 5908 

Line 2662303: 0000187c.00002a1c::2024/03/29-01:31:33.148 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2662312: 0000187c.00002a1c::2024/03/29-01:31:34.553 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2662315: 0000187c.00004014::2024/03/29-01:31:34.639 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2662363: 0000187c.00002818::2024/03/29-01:31:43.152 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2662371: 0000187c.00002818::2024/03/29-01:31:44.132 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2662376: 0000187c.00002818::2024/03/29-01:31:44.204 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2662388: 0000187c.00002a1c::2024/03/29-01:31:52.452 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2662389: 0000187c.00002a1c::2024/03/29-01:31:52.516 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2662515: 0000187c.00000824::2024/03/29-01:31:53.144 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2662518: 00000c44.00004f3c::2024/03/29-01:31:56.335 DBG   [NETFTAPI] received NsiParameterNotification for 10.163.101.12 (IpDadStatePreferred) 

Line 2662718: 00002888.00001ee0::2024/03/29-01:32:02.607 INFO  [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] SQL Server resource state is changed from 'ClusterResourceOffline' to 'ClusterResourceOnlinePending' 

Line 2662873: 0000187c.00000824::2024/03/29-01:32:03.147 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2662903: 00002888.00000618::2024/03/29-01:32:07.788 INFO  [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] SQL Server resource state is changed from 'ClusterResourceOnlinePending' to 'ClusterResourceOnline' 

Line 2662986: 0000187c.00000824::2024/03/29-01:32:11.972 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2662987: 0000187c.00002818::2024/03/29-01:32:12.018 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2662996: 0000187c.00004014::2024/03/29-01:32:13.152 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663025: 00000c44.00002560::2024/03/29-01:32:20.733 INFO  [RCM-plcmt] Group SQL Server (SMEAZT2WDB01) allowed to move to node 2 by filter PreferredOwnerWaitFilter 

Line 2663084: 00000c44.000026d0::2024/03/29-01:32:21.748 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663087: 00000c44.000026d0::2024/03/29-01:32:21.748 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663117: 00000c44.00004f3c::2024/03/29-01:32:21.751 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663120: 00000c44.00004f3c::2024/03/29-01:32:21.751 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663146: 00000c44.00004f3c::2024/03/29-01:32:22.566 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663149: 00000c44.00004f3c::2024/03/29-01:32:22.566 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663174: 00002888.0000264c::2024/03/29-01:32:22.567 INFO  [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] SQL Server resource state is changed from 'ClusterResourceOnline' to 'ClusterResourceOfflinePending' 

Line 2663183: 00002888.00000618::2024/03/29-01:32:22.797 ERR   [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] ODBC Error: [HY008] [Microsoft][SQL Server Native Client 11.0]Operation canceled (0) 

Line 2663184: 00002888.00000618::2024/03/29-01:32:22.797 ERR   [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] ODBC Error: [01000] [Microsoft][SQL Server Native Client 11.0][SQL Server]  (0) 

Line 2663192: 0000187c.00002818::2024/03/29-01:32:23.142 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663196: 00002888.00004e18::2024/03/29-01:32:25.574 INFO  [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] SQL Server resource state is changed from 'ClusterResourceOfflinePending' to 'ClusterResourceOffline' 

Line 2663202: 00000c44.00003620::2024/03/29-01:32:25.574 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663205: 00000c44.00003620::2024/03/29-01:32:25.575 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663310: 00000c44.000014b4::2024/03/29-01:32:25.589 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663313: 00000c44.000014b4::2024/03/29-01:32:25.589 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663462: 00002528.00004a04::2024/03/29-01:32:25.754 WARN  [RES] IP Address <SQL IP Address 1 (SMEAZAESQLCLU01)>: ListenerThread : Failed to accept incoming connection with error 10004. 

Line 2663468: 00000c44.0000271c::2024/03/29-01:32:25.754 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 Process state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663471: 00000c44.0000271c::2024/03/29-01:32:25.754 INFO  [RCM] rcm::RcmResource::ProcessStateChange: 17993076 finished processing state change for ClusterResourceOfflineSavingCheckpoints 

Line 2663494: 00000c44.000026d0::2024/03/29-01:32:25.756 ERR   [RCM] s\_RcmRpcGetResourceState: (5908)' because of ''SQL IP Address 1 (SMEAZAESQLCLU01)' is owned by node 2, not 1.' 

Line 2663519: 0000187c.00002818::2024/03/29-01:32:33.141 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663563: 0000187c.00004014::2024/03/29-01:32:43.153 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663572: 0000187c.00002818::2024/03/29-01:32:53.144 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663584: 0000187c.00004014::2024/03/29-01:32:53.946 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2663585: 0000187c.00004014::2024/03/29-01:32:53.994 WARN  [RHS] Error 2 from resource type control for restype Storage Replica. 

Line 2663589: 0000187c.00004014::2024/03/29-01:33:03.143 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663594: 0000187c.00002818::2024/03/29-01:33:13.147 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663617: 0000187c.00004014::2024/03/29-01:33:23.148 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663622: 0000187c.00004014::2024/03/29-01:33:33.149 WARN  [RHS] Error 1722 from resource type control for restype Virtual Machine. 

Line 2663626: 00000c44.00003620::2024/03/29-01:33:34.168 ERR   cxl::CertStore::IsKeyValid: (-2146893802)' because of 'NCryptOpenKey(certProv, certKey.Reference(), keyProvInfo->pwszContainerName, AT\_KEYEXCHANGE, (machineKey ? NCRYPT\_MACHINE\_KEY\_FLAG : 0) | NCRYPT\_SILENT\_FLAG)' 

Line 2663627: 00000c44.00003620::2024/03/29-01:33:34.206 ERR   cxl::CertStore::IsKeyValid: (-2146893802)' because of 'NCryptOpenKey(certProv, certKey.Reference(), keyProvInfo->pwszContainerName, AT\_KEYEXCHANGE, (machineKey ? NCRYPT\_MACHINE\_KEY\_FLAG : 0) | NCRYPT\_SILENT\_FLAG)' 

Line 2663628: 00000c44.00003620::2024/03/29-01:33:34.243 ERR   cxl::CertStore::IsKeyValid: (-2146893802)' because of 'NCryptOpenKey(certProv, certKey.Reference(), keyProvInfo->pwszContainerName, AT\_KEYEXCHANGE, (machineKey ? NCRYPT\_MACHINE\_KEY\_FLAG : 0) | NCRYPT\_SILENT\_FLAG)' 

Line 2663629: 00000c44.00003620::2024/03/29-01:33:34.281 ERR   cxl::CertStore::IsKeyValid: (-2146893802)' because of 'NCryptOpenKey(certProv, certKey.Reference(), keyProvInfo->pwszContainerName, AT\_KEYEXCHANGE, (machineKey ? NCRYPT\_MACHINE\_KEY\_FLAG : 0) | NCRYPT\_SILENT\_FLAG)' 

Line 2663630: 00000c44.00003620::2024/03/29-01:33:34.323 ERR   cxl::CertStore::IsKeyValid: (-2146893802)' because of 'NCryptOpenKey(certProv, certKey.Reference(), keyProvInfo->pwszContainerName, AT\_KEYEXCHANGE, (machineKey ? NCRYPT\_MACHINE\_KEY\_FLAG : 0) | NCRYPT\_SILENT\_FLAG)' 

Line 2663631: 00000c44.00003620::2024/03/29-01:33:34.373 ERR   cxl::CertStore::IsKeyValid: (-2146893802)' because of 'NCryptOpenKey(certProv, certKey.Reference(), keyProvInfo->pwszContainerName, AT\_KEYEXCHANGE, (machineKey ? NCRYPT\_MACHINE\_KEY\_FLAG : 0) | NCRYPT\_SILENT\_FLAG)'
Windows for business | Windows Server | Storage high availability | Clustering and high availability

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question.

0 comments No comments

4 answers

Sort by: Most helpful
  1. Anonymous
    2024-08-10T10:01:01+00:00

    After log analysis we didn't find anything. However we have removed the faulty node from cluster and re added back, it fixed the issue. Thank you for your support.

    Was this answer helpful?

    0 comments No comments
  2. Anonymous
    2024-04-22T02:03:36+00:00

    Hi Admin,

    Hope you're doing well.

    Have you seen any error messages in the event logs?

    Best Regards

    Was this answer helpful?

    0 comments No comments
  3. Anonymous
    2024-04-16T13:13:06+00:00
    1. Ensure that the SQL Server service account on the passive node has the correct permissions and can access the required databases.

    --Checked and seems to be fine

    1. Check the dependencies of the SQL Server resources and ensure that they are correctly migrated to the target node during the failover process.

    --checked and all the SQL servers resources are getting migrated to passive node during failover apart from SQL agent

    1. Check the cluster's network configuration to ensure that the IP address resource is correctly bound to the expected node.

    --Perfectly Setup.

    Do you need the cluster log with the date & time.

    Was this answer helpful?

    0 comments No comments
  4. Anonymous
    2024-04-16T03:49:37+00:00

    Hi Admin,

    Hope you're doing well.

    The following key information is seen in the log:

    Line 2661953: ERR [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] ODBC Error: [HY008] [Microsoft][SQL Server Native Client 11.0]Operation canceled (0)

    Line 2661954: ERR [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] ODBC Error: [01000] [Microsoft][SQL Server Native Client 11.0][SQL Server] (0)

    This indicates that SQL Server encountered an ODBC error when trying to start, possibly because it was unable to connect to the database or another database-related issue occurred.

    Line 2662264: ERR [RCM] s_RcmRpcGetResourceState: (5908)' because of ''SQL IP Address 1 (SMEAZAESQLCLU01)' is owned by node 2, not 1.'

    This error indicates that the SQL IP address resource is owned by node 2 instead of node 1, which may prevent the resource from being properly migrated to the expected node during failover.

    Line 2663174: INFO [RES] SQL Server <SQL Server (SMEAZT2WDB01)>: [sqsrvres] SQL Server resource state is changed from 'ClusterResourceOnline' to 'ClusterResourceOfflinePending'

    This indicates that the status of the SQL Server resource changed from "ClusterResourceOnline" to "ClusterResourceOfflinePending", which may have occurred during a failover attempt.

    Based on comprehensive analysis, you may need to check the following aspects to solve the problem:

    1. Ensure that the SQL Server service account on the passsive node has the correct permissions and can access the required databases.
    2. Check the dependencies of the SQL Server resources and ensure that they are correctly migrated to the target node during the failover process.
    3. Check the cluster's network configuration to ensure that the IP address resource is correctly bound to the expected node.

    Best Regards

    Was this answer helpful?

    0 comments No comments