RDMA Traffic Test Failed

asked 2021-04-21T05:25:28.317+00:00
Richelle Lanuza 86 Reputation points

Hi,

I run the RDMA Testing with diskspd inside C:\Windows\Sytems32. What does it mean that the "RDMA Traffic test Failed"?

Error Prompted:

ERROR: RDMA traffic test FAILED: Please check
ERROR: a) physical switch port configuration for Priority Flow Control.
ERROR: b) job owner has write permission at 172.16.xx.xx \C$

Can anyone elaborate these concerns?

Thank you,
Rich

Azure Stack HCI
Azure Stack HCI
A hyperconverged infrastructure operating system delivered as an Azure service that provides security, performance, and feature updates.
156 questions
No comments
{count} votes

Accepted answer
  1. answered 2021-04-21T18:34:14.683+00:00
    Trent Helms - MSFT 1,261 Reputation points

    Hi @RichelleLanuza-2661,

    Test-RDMA should work for iWARP and RoCE. As I understand, it simply uses DiskSpd to generate a synthetic workload which is carried over the SMB connection. For this test to pass, that connection must be established as an RDMA connection. Also, be sure you are running the tool as a user that has local admin rights on each node as this is required to access c$.

    A few questions to get a better understanding of your environment.

    1. Are you setting this up in Windows Server 2019 or an Azure Stack HCI 20H2 environment?
    2. What model of NICs are you using?
    3. Is RDMA enabled and set for iWARP on all storage NICs? (some NICs support both iWARP and RoCE)
    4. Are you using dedicated storage NICs (i.e. no virtual switch on top of the physical NICs)? I assume this is true because you are using a switchless config, but I want to be sure.

    Some things you could check are:

    1. Ensure the RDMA/NIC settings completely match across all cluster nodes.
    2. Ensure the driver and firmware on the NICs matches and is updated on each cluster node.
    3. Ensure that your storage NICs are each on their own separate VLAN/subnet.
    4. Check the SMB Client Connectivity logs to see if there are any useful errors regarding RDMA.

    Further than this, it may be worth opening a support ticket with your hardware vendor initially as the vast majority of RDMA is handled by the hardware. If they deem an issue in the OS, you could open a support ticket with us and we'd be glad to assist in confirming your setup.

    I hope this information is helpful.

    Thanks so much, Rich, and I hope you have a wonderful day!
    Trent

    No comments

5 additional answers

Sort by: Most helpful
  1. answered 2021-04-21T16:59:19.61+00:00
    MattMcSpirit-MSFT 561 Reputation points

    Hi Rich, I just checked in with one of our networking specialists, and his response was:

    I assume that they ran the old Test-RDMA test which just tries to send some RDMA traffic from the interface they specified to another. It can fail for a variety of reasons (some we can detect (configuration on the host), others we cannot (e.g. stuff on the physical network)).

    They should first run Validate-DCB (Install-Module Validate-DCB or aka.ms/Validate-DCB). If that passes, then the issue is likely configuration on their network switches.

    Anything more than that, I would recommend that they open a support case (our support team can connect the customer with the Switch/NIC vendor in most cases).

    Would you be able to double-check the Validate-DCB steps, and let us know? Also, is this a WS 2019 or AzSHCI 20H2 environment?

    Thanks!
    Matt

    No comments

  2. answered 2021-04-21T17:41:50.087+00:00
    Richelle Lanuza 86 Reputation points

    Hi @MattMcSpirit-MSFT as per my checking the Validate-DCB is for RoCE.

    May I know If you have some reference for Testing of RDMA for iWARP?

    Because our current environment is Switchless Configuration/Peer to Peer basis.

    Thanks,
    Rich :)

    No comments

  3. answered 2021-04-22T00:03:33.247+00:00
    Richelle Lanuza 86 Reputation points

    Hi @Trent Helms - MSFT

    To answer your questions, see below:

    1. Are you setting this up in Windows Server 2019 or an Azure Stack HCI 20H2 environment? Azure Stack HCI 20H2 Environment
    2. What model of NICs are you using? QLogic FastLinq (QL41262)
    3. Is RDMA enabled and set for iWARP on all storage NICs? (some NICs support both iWARP and RoCE) Yes we are already Enabled the RDMA using this command "Enable-NetAdapterRdma -Name $StorageAdaptersAll"
    4. Are you using dedicated storage NICs (i.e. no virtual switch on top of the physical NICs)? I assume this is true because you are using a switchless config, but I want to be sure. -Our connection for storage traffic has no connection on the Top of Rack Switches. We are using a DAC Cable (SFP28-25GB) so that both nodes are connected as peer-to-peer.

    Thank you,
    Rich

    No comments

  4. answered 2021-04-22T19:08:36.6+00:00
    Richelle Lanuza 86 Reputation points

    Hi @Trent Helms - MSFT and @MattMcSpirit-MSFT

    Does the RDMA Testing requires that the nodes/clusters are working?

    Our current setup for the Clusters are under Maintenance Mode and our connection for Storage Traffic was Switchless Configuration using DAC Cable.

    What would be the effect if we already start the Clusters for Azure Stack HCI, even if the RDMA Test was FAILED?

    Hoping for your kind of assistance.

    Thank you and Keep safe,
    Rich 😊

    No comments