Cluster Network Validation - fail UDP port 3343

Notes Admin 96 Reputation points
2021-01-28T13:48:49.307+00:00

When running the cluster network validation test on 2 x HPE DL380 Gen10 fully patched and firmware/driver updated Windows Server 2019 (LTSC 1809) with Hyper-V role nodes (pre-cluster creation) it gets the error below:
Network interfaces s-test-01.assemblyni.gov.uk - LOM1Port1_Mgmt and s-test-02.assemblyni.gov.uk - LOM1Port 1_Mgmt are on the same cluster network, yet address 10.63.35.30 is not reachable from 10.63.35.31 using UDP on port 3343.

The above problematic NICs are 1Gbps and used for management, RDP etc and are the only NICs with default gateways set and are connected via a Cisco 3750 switch with no ACL or port security configured.
Each server also has a single NIC with dual 25Gbps ports which are directly connected with DAC cables as we do not currently have the 25Gbps switches.
All other NICs are vNICs created on a switch embedded team on each server that uses the dual port 25Gbps NIC.
What has been tried:

  1. Firewall has been disabled on all profiles on both servers. No other FWs between the servers
  2. Real-time monitoring has been disabled on both servers for Windows Defender which is the only AV used
  3. Servers full patched with HPP SPP 2020-09, all Windows OS Updates and restarted several times
  4. When I change the mgmt. nic on one server to be in a different subnet the validation test works but why? Also when you go to create the cluster it will ask for a cluster VIP address which needs to be in the same subnet across all servers and it only offers the mgmt. NIC IP address subnets I assume because they are the only ones with default gateway set?
    I can find plenty of similar articles but none that answers this scenario and I would really appreciate any help or advice please.
    Thanks
    Stu
    kk
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
969 questions
0 comments No comments
{count} votes

Accepted answer
  1. Notes Admin 96 Reputation points
    2021-02-05T10:33:40.72+00:00

    For anyone interested I found the solution but I cannot tell you why this works.
    As opposed to using a single physical onboard network port, I decided to try teaming 2 of the onboard 1Gbps network adapters and then create a virtual NIC and use it for the management traffic across both server nodes and whatya know, it flamin worked!? But WHY?
    So I don't know if this is a Failover cluster requirement or why I couldn't create the cluster when using a single physical network port for management traffic. Specifically the problem being failing to communication over UDP on port 3343.
    I have not read any article saying watch out dont do crazy stuff like that because it is not supported and the pre-requisite for a Microsoft 2019 Hyper-V cluster is you must use resilient virtual NICs for your management traffic.
    I dont know if this makes sense to anyone and I would appreciate if anyone is able to explain this, please feel free to enlighten me and/or others :-)

    To finish I have to thank MIco who did enlighten me on the multi-subnet cluster articles.
    I would also like to thank Romain Serre whose article made me think to try using vNICs for management.
    https://www.tech-coffee.net/2-node-hyperconverged-cluster-with-windows-server-2016/#comment-3732
    I also found this article useful:
    https://social.technet.microsoft.com/Forums/windowsserver/en-US/c3e15170-2a83-48a8-b671-efc2a9afe4cf/s2d-cluster-validation-fails-firewall-and-udp-port-3343

    0 comments No comments

9 additional answers

Sort by: Most helpful
  1. Notes Admin 96 Reputation points
    2021-02-01T21:37:42.887+00:00

    UPDATE: So I set the 2 node cluster up using different subnets for the management NICs and the cluster validation test passed except for a warning about QFE information on one node.
    "There was an error retrieving QFE information from node 2. Exception result 0x80072ee2."
    But the software updates are the same on both nodes.
    It lets me go ahead and try to create the cluster but it fails. I have to generate cluster log and see that it says:
    “Connection attempt failed with error (10060): Failed to connect to remote endpoint 10.63.37.5:~3343~.
    So even when using different subnets and cluster validation tests pass it still fails with what looks like the same problem with UDP over port 3343?
    Any ideas?

    0 comments No comments

  2. Mico Mi 1,921 Reputation points
    2021-02-02T05:57:11.2+00:00

    Hi,
    Please check the doc:
    You are unable to join a node into a cluster if UDP port 3343 is blocked
    Best Regards,
    Mico Mi

    0 comments No comments

  3. Notes Admin 96 Reputation points
    2021-02-02T11:01:01.703+00:00

    Thanks, had seen that article but it just says resolution is to open the UDP port 3343 which I have. I completely disabled the firewall on all profiles both servers and tried to recreate the cluster but it failed for the same reason:
    "status 10060 Failed to connect to remote endpoint 10.63.37.5:~3343~"

    I ran a netstat –ano on both servers during the cluster creation and I can see both servers listening on TCP port 3343 but not UDP and the cluster creation fails. When I do the same netstat test during cluster validation I see the servers listening on UDP 3343 but not TCP 3343 and the validation passes. I have attached excerpt from cluster log during cluster creation.
    63041-cluster.txt


  4. Notes Admin 96 Reputation points
    2021-02-03T19:51:27.28+00:00

    I tried to create a network trace during cluster creation to see if it would help by running this command:
    netsh trace start capture=yes Ethernet.Type=IPv4 IPv4.Address=10.63.37.5 Protocol=UDP
    But when I open the .etl file with NetMon or Message Analyser it just looks like meaningless information, see attached screenshot.
    63665-image.png
    Can anyone tell me how you would troubleshoot this problem to determine what is causing the failure to connect over UDP, port 3343 even though it passes cluster validation test?
    Is there a better command line for running a network trace to highlight the problem? There must be an individual out there that knows how to get to the root cause of a problem like this?
    There must be a methodical way to narrow down the problem or prove what is going wrong?
    Advanced network card setting maybe but they look fine to me?
    63589-image.png

    0 comments No comments