all cluster cmdlts toss error "The remote server has been paused or is in the process of being started."

Question

server 2019

new cluster; was getting a lot of event 1237; got my ad admin to allow the cluster dnn to dynamically update it's ip address; event errors went away

but then my four nodes dropped like hot potatoes hours later

any cluster cmdlt (like get-clusterNode or even stop-cluster or remove-cluster) now toss error "The remote server has been paused or is in the process of being started." with "FullyQualifiedErrorId : ClusterSharingPaused"

get-wmiobject mscluster_resourcegroup -namespace "ROOT\MSCluster"

same error

netsh advfirewall firewall show rule name="windows management instrumentation (async-in)"
netsh advfirewall firewall show rule name="windows management instrumentation (wmi-out)"
netsh advfirewall firewall show rule name="windows management instrumentation (wmi-in)"
netsh advfirewall firewall show rule name="windows management instrumentation (dcom-in)"

all showed not enabled, so:

netsh advfirewall firewall set rule group="windows management instrumentation (wmi)" new enable=yes

but that didn't help, so:

net stop winmgmt /y
winmgmt /resetRepository
restart-computer

did this on all four nodes, still not able to even redo the cluster (remove-cluster fails)

when I could run "get-clusterNetwork", it's two 10gig cluster only, and one nic team 2gig none (side note for those who know, should this be set to "2", which is unsupported clientOnly?); and hidden Microsoft Failover Cluster Virtual Adapter is up:

get-netAdapter -includeHidden | where {$_.interfaceDescription -match 'failover'}

but then I noticed that one node had a different name for that virtual adapter, so tried:

get-netAdapter -includeHidden -name "Local Area Connection* 11" | rename-netAdapter -newName "Local Area Connection* 1

but got error about "name already existed", but it doesn't (this returns nothing):

get-netAdapter -includeHidden -name "Local Area Connection* 1"

unsure if that matters, and any help or pointing me in a direction would be appreciated (again, I suspect my trouble began when addns allow ip update was added, but it solved my event 1237, just maybe caused inter node communication problems, like now no nodes receive heartbeat; i.e. a lot of event 1650 now: oscillating from missed heartbeat, established UDP connection, lost UDP connection)

Accepted Answer

just redo the cluster (fixed for me); just say my nodes are called "one", "two", "three", and "four":

ran these from my quorum file share machine to one fell swoop the uninstall/install, where my four nodes have these two cmdlts already run:
set-item WSman:\localhost\client rustedHosts -concatenate -value "myQuorumShareMachine"
enable-psremoting

invoke-command -computerName one,two,three,four {remove-windowsFeature failover-clustering -restart}
icm one,two,three,four {install-windowsFeature failover-clustering -includeManagementTools -restart}
icm one,two,three,four {clear-clusterNode}

then on any node, in dsa.msc, remove my dnn (call it "myCluster"), add it back, and disable it (and side note: have the addns admin give it rights to dynamically update its ip address), and while in there, delete myCluster-CAU; note my intranet is the pair of non-dhcp 10gig connections (and my quorum share already has computer object "myCluster" allowed to write)

new-cluster -name myCluster -node one,two,three,four -noStorage -ignoreNetwork 192.168.0.0/24,192.168.1.0/24 –managementPointNetworkType distributed
add-CAUclusterRole -daysOfWeek saturday -weeksOfMonth 1 -requireAllNodesOnline -maxFailedNodes 1 -enableFirewallRules -CAUpluginName Microsoft.WindowsUpdatePlugin -virtualComputerObjectName myCluster-CAU -groupName myCluster-CAU
set-clusterQuorum -fileShareWitness \myQuorumShareMachine\witness$ -credential $(get-credential)
get-clusterNetwork | ft name,address,role
# set my two intranet 10gpbs to cluster only, and my nic team 2gbps lacp to none (no cluster traffic, just client traffic; unsure about using "unsupported" "2" for client only)
(get-clusterNetwork "Cluster Network 1").role = 1
(get-clusterNetwork "Cluster Network 2").role = 1
(get-clusterNetwork "Cluster Network 3").role = 0

now get back my data volumes

get-clusterAvailableDisk | add-clusterDisk
get-clusterResource | where {$_.ownerGroup –eq "Available Storage" -and $_.name -ne "Cluster Virtual Disk (ClusterPerformanceHistory)"} | add-clusterSharedVolume

and if we're going down the rabbit hole, note I clean up some artifacting

get-clusterResource
# the only cluster disk that has ownerGroup as "Cluster Group" is the old cluster performance history disk (and per virtualDisk, there will be two cluPerfHis, but the old one will be opStatus detatched and healthStatus unknown)
remove-clusterResource "Cluster Disk 20"
get-virtualDisk "ClusterPerformanceHistory" | where {$_.healthStatus -eq "unknown"} | remove-virtualDisk
# and per get resources above, cluster pool 1 is owned by SID/GUID
move-clusterResource "Cluster Pool 1" -group "Cluster Group"
remove-clusterGroup "51620a48-3f0c-4175-8ac5-7f3839e39a0a"
# again per get resource, dnn isn't right
(get-clusterResource "Cluster Name").name = "myCluster"
get-clusterSharedVolume
# they're all "Cluster Disk X", so match those up to old names:
get-clusterSharedVolume | ft name,sharedVolumeInfo
(get-clusterSharedVolume "Cluster Disk 1").name = "Cluster Virtual Disk (originalName)"
# and the share's to those are gone, so get them back (note I create a main folder in each csv, for easy change of permissions and so peeps don't see sys vol info and recycle bin
new-SMBshare -name originalName -path c:\clusterStorage\originalName\subfolder -fullAccess "myDomain\my ou admins",builtin\administrators -changeAccess "myDomain\originalName users"

some of the virtualDisks were inService, so nice function you can run to keep an eye on the "get-storageJob" function (but they were accessible, no users won't know); just ctrl + "c" to exit the function:

function refreshVDSJST () { while($true) {get-virtualDisk | where {$_.healthStatus -eq "warning"} | ft; get-storageJob; icm nb-s2d4 `
{get-scheduledTask -taskName "Data Integrity Scan for Crash Recovery" | where state -eq running} | ft; sleep -s 420; clear-host;} }
refreshVDSJST

hope this helps someone!

all cluster cmdlts toss error "The remote server has been paused or is in the process of being started."

0 additional answers