Hi all,
We are having a nightmare issue, occurring more or less once per month.
Suddenly one or more nodes of our Windows 2016 Hyper-V cluster stop responding completely, so we must reboot them. Last time 4 out of 8 nodes went down, first two and then another two.
All I can see are 252 events for the VMSwitches and 5120 for lost CSVs.
252 events are crazy, or maybe I do not understand them, but the amount of memory is very high, for example the one below:
Memory allocated for packets in a vRss queue (on CPU 0) on switch 8893CCCF-B197-4A55-A3D6-7350D9D44731 (Friendly Name: LAN) due to low resource on the physical NIC has increased to 66049MB. Packets will be dropped once queue size reaches 512MB.
The cluster is 8 node connected to storage on an fibre channel SAN. We opened cases with MS, the Stroage vendor, the switches, but never found anything.
i cannot figure out if the issue are CSVs being lost, and the CSV traffic going over ethernet on the failed nodes is collpasing the network cards, or maybe the issue is with the network cards having a bottleneck.
Network is like below:
1Gbps NICS
Teaming with 3 NICs for management, vmswitch and cluster traffic.
Teaming with 2 NICs for a backup network and live migration (DPM uses this one)
1 NIC for HB network, second in line for live migration in priority.
I do NOT have VMQ enabled, but I do have RSS enabled on the hosts.
We also notices that between the nodes, pings are lost more often than it should, both between nodes and between VMs in different nodes.
Get-NetAdapterRss
Name : vEthernet (BCK)
InterfaceDescription : Hyper-V Virtual Ethernet Adapter #2
Enabled : True
NumberOfReceiveQueues :
Profile :
BaseProcessor: [Group:Number] : :
MaxProcessor: [Group:Number] : :
MaxProcessors :
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : vEthernet (LAN)
InterfaceDescription : Hyper-V Virtual Ethernet Adapter
Enabled : True
NumberOfReceiveQueues :
Profile :
BaseProcessor: [Group:Number] : :
MaxProcessor: [Group:Number] : :
MaxProcessors :
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : LAN 2
InterfaceDescription : Intel(R) Gigabit 4P I350-t rNDC #4
Enabled : True
NumberOfReceiveQueues : 2
Profile : Closest
BaseProcessor: [Group:Number] : :0
MaxProcessor: [Group:Number] : :
MaxProcessors : 8
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : HB
InterfaceDescription : Intel(R) Gigabit 4P I350-t rNDC #2
Enabled : True
NumberOfReceiveQueues : 2
Profile : Closest
BaseProcessor: [Group:Number] : :0
MaxProcessor: [Group:Number] : :
MaxProcessors : 8
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : BCK1
InterfaceDescription : Intel(R) Gigabit 4P I350-t rNDC
Enabled : True
NumberOfReceiveQueues : 2
Profile : Closest
BaseProcessor: [Group:Number] : :0
MaxProcessor: [Group:Number] : :
MaxProcessors : 8
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : LAN 1
InterfaceDescription : Intel(R) Gigabit 4P I350-t rNDC #3
Enabled : True
NumberOfReceiveQueues : 2
Profile : Closest
BaseProcessor: [Group:Number] : :0
MaxProcessor: [Group:Number] : :
MaxProcessors : 8
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : LAN
InterfaceDescription : Microsoft Network Adapter Multiplexor
Enabled : True
NumberOfReceiveQueues :
Profile :
BaseProcessor: [Group:Number] : :
MaxProcessor: [Group:Number] : :
MaxProcessors :
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : SLOT 3 Puerto 4
InterfaceDescription : Broadcom NetXtreme Gigabit Ethernet #4
Enabled : True
NumberOfReceiveQueues : 1
Profile :
BaseProcessor: [Group:Number] : :0
MaxProcessor: [Group:Number] : :
MaxProcessors : 16
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : SLOT 3 Puerto 3
InterfaceDescription : Broadcom NetXtreme Gigabit Ethernet
Enabled : True
NumberOfReceiveQueues : 1
Profile :
BaseProcessor: [Group:Number] : :0
MaxProcessor: [Group:Number] : :
MaxProcessors : 16
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : BCK2
InterfaceDescription : Broadcom NetXtreme Gigabit Ethernet #3
Enabled : True
NumberOfReceiveQueues : 1
Profile :
BaseProcessor: [Group:Number] : :0
MaxProcessor: [Group:Number] : :
MaxProcessors : 16
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :
Name : LAN3
InterfaceDescription : Broadcom NetXtreme Gigabit Ethernet #2
Enabled : True
NumberOfReceiveQueues : 1
Profile :
BaseProcessor: [Group:Number] : :0
MaxProcessor: [Group:Number] : :
MaxProcessors : 16
RssProcessorArray: [Group:Number/NUMA Distance] :
IndirectionTable: [Group:Number] :