Firstly, please understand, root cause analysis requires dedicated logs and requires it to be a reproduced issue, besides, it is limited to do root cause analysis on the forum.
Based on my experience, I would suggest you check if the cluster nodes are up to date, if not, please install the latest windows update on the Cluster nodes.
Besides, you may check if there's cluster error log 1146 on the cluster log, if yes, it's recommended to enable RHS dump, so that if the issue reoccur, we may collect related logs for analyze:
On problematic nodes
a. Confirm the cluster node had configured Kernel memory dump and paging file\dump location disk has enough space
To set up a kernel memory dump, steps can be referred to:
- Enable the system to generate a kernel memory dump by changing the following registry key:
Value Name: CrashDumpEnabled
Data Type: REG_DWORD
- Please explicitly specify the paging files on the system drive.
· Locate to the registry HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Memory Management
· Double click 'PagingFiles' and change the paging file on C drive as "C:\pagefile.sys 8300 8300" which changes the initial size and maximum size of paging file.
- Please specify the following registry key to change the dump file location:
· Locate to the registry HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl
· Double click 'DumpFile' to change the location of the dump file. Please choose another disk drive with more free space.
- Reboot the server for settings take effect.
b. Add a DWORD registry value named "DebugBreakOnDeadlock" value 3 at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusSvc\Parameters\DebugBreakOnDeadlock
c. Restart the cluster service.
d. Download RHSMon.zip from attachment, run run.cmd.
e. Upload system event log and cluster log from both nodes.
When the RHS deadlock issue occurs, rhsmon.exe will detect it. It will launch dumpncrash.cmd to create user dump for rhs.exe/clussvc.exe and then crash the box to get the memory dump. Please remove the DebugBreakOnDeadlock key from this node once the issue been reproduced and dump generated.
To analyze the logs and get the RHSMon tool, it's recommended to open a case with MS:
Thanks for your time!
If the Answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.