Server 2022 RDMA Mellanox ConnectX-4

tarik gretscher 6 Reputation points
2021-11-09T14:57:48.077+00:00

Hi,

I have tryed to get RDMA working with Server 2022 build with the Mellanox ConnectX-4 Adapter.
My config is similar to this site https://blog.alphatrust.ch/windows-server/hyper-converged-infrastructure/konfigurieren-von-rdma-over-converged-ethernet-roce-pfc-unter-windows-server-2019/
I can see RDMA enabled and configured for my Mellanox and vNICs

Get-NetAdapterRdma

Name InterfaceDescription Enabled Operational PFC ETS


Mellanox_anksinc151 Mellanox ConnectX-4 Adapter #2 True True True True
vEthernet (Storage1) Hyper-V Virtual Ethernet Adapter #2 True False NA NA
Mellanox_anksinc150 Mellanox ConnectX-4 Adapter True True True True
vEthernet (Internal) Hyper-V Virtual Ethernet Adapter #4 False False NA NA
vEthernet (Verwaltung) Hyper-V Virtual Ethernet Adapter False False NA NA
vEthernet (Storage2) Hyper-V Virtual Ethernet Adapter #3 True False NA NA

but wenn i check the SMB Settings all i can see is RDMA Capable $false

Get-SmbServerNetworkInterface

Scope Name Interface Index RSS Capable RDMA Capable Speed IpAddress


FE80::E98F:7273:9C34:E333 2 True False 56 Gbps 10.10.2.112
FE80::E98F:7273:9C34:E333 24 True False 56 Gbps 10.10.1.202
FE80::E98F:7273:9C34:E333 4 False False 56 Gbps 192.168.103.234
FE80::E98F:7273:9C34:E333 10 True False 10 Gbps fe80::9904:96fc:feed:7edd
* 10 True False 10 Gbps fe80::9904:96fc:feed:7edd
* 2 True False 56 Gbps fe80::19c1:291b:f4d1:4473
* 24 True False 56 Gbps fe80::14d3:b959:977f:f244
* 4 False False 56 Gbps fe80::edde:7621:b71:4a0c
* 23 False False 10 Gbps fe80::e98f:7273:9c34:e333
* 10 True False 10 Gbps 169.254.1.1

the same configuration works on the same hosts with rdma working on Server 2019

nd_write_bs.exe from mellanox returned
nd_write_bw.cpp(839):
NdStartup failed with 80004002

test-rdma.ps1
.\test-rdma.ps1 -IfIndex 2 -IsRoCE $true -RemoteIpAddress 10.10.2.101
VERBOSE: Diskspd.exe found at C:\Windows\System32\diskspd.exe
VERBOSE: The adapter vEthernet (Storage2) is a vNIC
ERROR: RDMA capabilities for adapter vEthernet (Storage2) are not valid : MaxQueuePairCount is 0

do someone else have discoverd the same issue?

Windows for business | Windows Client for IT Pros | Storage high availability | Virtualization and Hyper-V
Windows for business | Windows Server | User experience | PowerShell
Windows for business | Windows Server | User experience | Other
{count} votes

3 answers

Sort by: Most helpful
  1. tarik gretscher 6 Reputation points
    2021-12-29T16:46:27.463+00:00

    Hi there,

    sorry for my huge delay i had some issues with the production cluster so i was not able to test some stuff again.

    i tried to get rdma working with the a new cluster (4 Node Supermicro X11 , Mellanox ConnectX-5 100G).
    RDMA is still not working in my environment as soon as i installed server 2022.
    With Server 2019 everything works fine.
    So here are my settings:

    1. Verify if RDMA is enabled, the first one check if it's enabled on the server; the second one checks if it's enabled on the network adapters.

    Get-NetOffloadGlobalSetting

    ReceiveSideScaling : Enabled
    ReceiveSegmentCoalescing : Enabled
    Chimney : Disabled
    TaskOffload : Enabled
    NetworkDirect : Enabled
    NetworkDirectAcrossIPSubnets : Blocked
    PacketCoalescingFilter : Disabled


    PS C:\Windows\system32> Get-NetAdapterRdma

    Name InterfaceDescription Enabled Operational PFC ETS


    Ethernet 4 Intel(R) Ethernet Connection X722 for... True False False False
    Ethernet 2 Intel(R) Ethernet Connection X722 for... True False False False
    Mellanox_anksinc151 Mellanox ConnectX-5 Adapter #2 True True True True
    vEthernet (Verwaltung) 2 Hyper-V Virtual Ethernet Adapter #4 False False NA NA
    vEthernet (Storage1) Hyper-V Virtual Ethernet Adapter #5 True False NA NA
    Mellanox_anksinc150 Mellanox ConnectX-5 Adapter True True True True
    vEthernet (Storage2) Hyper-V Virtual Ethernet Adapter #6 True False NA NA

    1. If the network adapter supports RoCE, we also need to configure the Switches to manage bandwidth(DCB/PFC);

    Get-NetAdapterAdvancedProperty

    Mellanox_anksinc150 NetworkDirect Technology RoCE *NetworkDire... {3}

    Mellanox_anksinc150 DcbxMode Host in Charge DcbxMode {0}

    PFC Settings

    Set-NetQosDcbxSetting -Willing 0 -Confirm:$false
    Enable-NetQosFlowControl -Priority 3
    Disable-NetQosFlowControl 0,1,2,4,5,6,7
    New-NetQosPolicy -Name "SMB" -NetDirectPortMatchCondition 445 -PriorityValue8021Action 3
    New-NetQosPolicy -Name "Cluster" -Cluster -PriorityValue8021Action 7
    New-NetQosPolicy -Name "Default" -Default -PriorityValue8021Action 0
    New-NetQosTrafficClass "SMB" -Priority 3 -BandwidthPercentage 50 -Algorithm ETS
    New-NetQosTrafficClass „Cluster" -Priority 7 -BandwidthPercentage 1 -Algorithm ETS

    1. For the OS, we need to use Server 2012 or higher with SMB3.0 along with SMB multichannel enabled;

    The OS is Server 2022 Datacenter (GUI)
    Get-SmbClientConfiguration
    EnableMultiChannel : True

    1. For the failover cluster, please make sure the RDMA NICs are used for client access;

    the storage network is configured as client access

    1. RDMA doesn't work with NIC teaming or Virtual Switch.

    as i remember it is available for Switch embedded Teaming (SET) and all of my Server 2019 Cluster are configured with SET and RDMA is working for the S2D.

    1. Please also ensure you install the latest NIC drivers and latest firmware.

    I installed the last Firmware and Drivers form Mellanox.

    Driverversion: 2.70.24728.0
    FW: 16.31.1014

    In addition, please also run Cluster Validation Wizard to check if the cluster passed all tests.

    passed all tests except the driver signature test because of the Intel VROC driver is not signed.

    1 person found this answer helpful.
    0 comments No comments

  2. Ueba3ba 6 Reputation points
    2022-10-09T19:59:00.453+00:00

    Have the same problem.

    Windows Server 2022.

    RDMA tested without SET switch: Works
    RDMA with SET Switch: does not work

    Did you find a solution?

    1 person found this answer helpful.
    0 comments No comments

  3. Limitless Technology 39,931 Reputation points
    2021-11-12T09:15:23.867+00:00

    Hi there,

    Some quick points you can check upon,

    1. Verify if RDMA is enabled, the first one check if it's enabled on the server; the second one checks if it's enabled on the network adapters.
    2. If the network adapter supports RoCE, we also need to configure the Switches to manage bandwidth(DCB/PFC);
    3. For the OS, we need to use Server 2012 or higher with SMB3.0 along with SMB multichannel enabled;
    4. For the failover cluster, please make sure the RDMA NICs are used for client access;
    5. RDMA doesn't work with NIC teaming or Virtual Switch.
    6. Please also ensure you install the latest NIC drivers and latest firmware.
    7. In addition, please also run Cluster Validation Wizard to check if the cluster passed all tests.

    --If the reply is helpful, please Upvote and Accept it as an answer--

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.