Storage Spaces Direct Cluster (Validation fails on port 3343 over the Mellanox NICs)

Glen Harrison 6 Reputation points
2022-01-21T16:49:45.31+00:00

Hi Everyone,

I am building a 4 node storage spaces direct cluster running Server 2022.

Each node has two (dual port) NICS. Intel 10gb and Mellanox 100gb.

When running the cluster validation test, is it normal to see errors on the mellanox NICs for port 3343?

My config is:

intel nic0 and nic1 attached to a SET vSwitch (vnics for management, cluster, livemigration)

mellanox nic0 and nic1 for storage

The report is all green except for this one error. It's not firewall, as i've checked the ports are allowed.

Thanks!!

Windows Server
Windows Server
A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.
13,275 questions
Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,739 questions
Windows Server Storage
Windows Server Storage
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Storage: The hardware and software system used to retain data for subsequent retrieval.
656 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Limitless Technology 39,686 Reputation points
    2022-01-25T17:00:18.25+00:00

    Hi there,

    Some points to note here.

    • Patch the server with all Windows OS Updates and restart it.
    • Try disabling the Antivirus on both the servers and give it a try.

    Here is a thread as well that discusses the same issue and you can try out some troubleshooting steps from this and see if that helps you to sort the Issue.

    Cluster Network Validation - fail UDP port 3343
    https://learn.microsoft.com/en-us/answers/questions/249241/cluster-network-validation-fail-udp-port-3343.html

    S2D Cluster Validation Fails Firewall and UDP Port 3343
    https://social.technet.microsoft.com/Forums/office/en-US/c3e15170-2a83-48a8-b671-efc2a9afe4cf/s2d-cluster-validation-fails-firewall-and-udp-port-3343?forum=winserverfiles

    --------------------------------------------------------------------------------------------------

    --If the reply is helpful, please Upvote and Accept it as an answer--

    0 comments No comments

  2. MirandaVeracruz 106 Reputation points
    2024-04-15T06:58:49.68+00:00

    @Glen Harrison any update here? I ran into same issue with a new deployed Server 2022 cluster (4xDell AX740xd, SMB-Traffic via QLogic QL41262 over Cisco N9K-C93180YC-EX-switches).

    When we started patching last week (due to april-patchday) it took 7 minutes after first node had rebooted until it rejoined the cluster. In Failover cluster manager it throwes error

    Cluster node 'HYPERVISOR01' failed to join the cluster because it could not communicate over the network with any other node in the cluster. Verify network connectivity and configuration of any network firewalls.

    followed by

    Cluster failed to start. The latest copy of cluster configuration data was not available within the set of nodes attempting to start the cluster. Changes to the cluster occurred while the set of nodes were not in membership and as a result were not able to receive configuration data updates. . Votes required to start cluster: 2 Votes available: 0 Nodes with votes: HYPERVISOR02 HYPERVISOR03 HYPERVISOR04 Guidance: Attempt to start the cluster service on all nodes in the cluster so that nodes with the latest copy of the cluster configuration data can first form the cluster. The cluster will be able to start and the nodes will automatically obtain the updated cluster configuration data. If there are no nodes available with the latest copy of the cluster configuration data, run the 'Start-ClusterNode -FQ' Windows PowerShell cmdlet. Using the ForceQuorum (FQ) parameter will start the cluster service and mark this node's copy of the cluster configuration data to be authoritative.  Forcing quorum on a node with an outdated copy of the cluster database may result in cluster configuration changes that occurred while the node was not participating in the cluster to be lost.

    In my host-based-firewall-logs, i noticed:

    DROP TCP 10.100.0.10 192.168.100.10 51199 3343 0 - 0 0 0 - - - SEND 13592

    which is weired because 10.100.0.10 is my management-ip and 192.168.100.10 is SMB-A-network. Why does the management-network try to communicate via SMB-A-network which is unrouted?!

    Cheers
    Miranda

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.