We are using Windows OS with FDR card.
When the performance was measured by the nd_send_xxx test, the performance was only about 33 Gbps compared to the spec performance of 56 Gbps.
We have seen that about 50 Gbps can be achieved when running on Linux OS.
Please see details on the test below.
Server: Fujitsu PRIMEQUEST
HCA driver: WinOF5.25
OS version: windows server2016 Datacenter ver.10.0.14393
start /b /affinity 0x1 nd_send_bw -a -n 1000 -S 192.168.0.101
start /b /affinity 0x1 nd_read_bw -a -n 1000 -S 192.168.0.101
start /b /affinity 0x1 nd_write_bw -a -n 1000 -S 192.168.0.101
qp #bytes #iterations MR [Mmps] Gb/s CPU Util.
nd_send_bw 0 8388608 1000 0.000 33.53 99.93
nd_read_bw 0 8388608 1000 0.000 33.13 100.00
nd_write_bw 0 8388608 1000 0.000 33.46 99.70
We also did loopback test, and got the results of only 33Gb/s too,
In the loopback test, the traffic is not sent out to the switch, but is looped back to the host
So we think the bottleneck of the performance is in the Windows ND stack.
Please advise how to improve the performance of Windows ND stack.