Network throughput around 100Mbps only within transfers from Azure VMs with robocopy

Question

Network throughput around 100Mbps only within transfers from Azure VMs with robocopy

VJ-8370 551

Hi,

We have to transfer some data in TBs within Azure VMs and from Azure VM to NetApp CVO. Based on test transfers we have observed that the throughput is approximately 100 Mbps only (when tested with 100K+ plus files. Actual data will have millions of files)

Any pointers where the issue could be so that data transfer can happen quickly from Azure VMs to NetApp CVO and between Azure VMs. We need to retain ACLs, timestamps of all the files.

Regards,
VJ

Accepted answer

1 additional answer

Your answer

Answer 1

@VJ-8370

To troubleshoot throughput issues here some steps that I would suggest-

Please check the bandwidth threshold of the Azure VM that you are using and try changing the VM type/size that can accomodate more BW. More details about the VMs and their BW is given here in this link. Please click on the type of VM that you have and then click on the series which should show the table with all the different sizes and their expected BW.
Traceroute is a good tool for measuring network performance characteristics (like packet loss and latency) along every network path between a source device and a destination device. Do a traceroute test from the source to destination and destination to source both ways and check for any latency along the path.
Ping test can be used to check for any loss between the source and destination if any. You can try a regular ICMP ping test and also a TCP based test using Psping.
NTttcp is a tool for testing the TCP performance of a Linux or Windows VM. You can change various TCP settings and then test the benefits by using NTttcp. For more information, see these resources: Bandwidth/Throughput testing (NTttcp) NTttcp Utility
You can also use Iperf3 to measure the actual bandwidth achieved between the source and destination. One of the side acts as server and the other side acts as client (i.e., the source and destination). Please make sure that the ports being used are open in the firewalls/NSGs while testing. Iperf default uses port Here are some commands for Iperf3 test:

On the server side: "iperf3 -s -V"
On the client side:

30 parallel TCP streams: iperf3 -c <IP of Azure VM or On-Prem host> -P 30 -t 30
1 Gbps UDP test: iperf3 -c <IP of Azure VM or On-Prem host> -u -b 1G -t 30

You can further alter the windows sizes and test as shown below-

Window size 128K: iperf3 -c <IP of Azure VM or On-Prem host> -w 128K -t 30
Window size 512K: iperf3 -c <IP of Azure VM or On-Prem host> -w 512K -t 30
Window size 1024K: iperf3 -c <IP of Azure VM or On-Prem host> -w 1024K -t 30

You can also further perform some Packet captures to look for any unexpected behaviors such as seeing TCP packets with TCP flags (SACK, DUP ACK, RETRANSMIT, and FAST RETRANSMIT) that could indicate network performance problems. These packets specifically indicate network inefficiencies that result from packet loss. But packet loss isn't necessarily caused by Azure performance problems. Performance problems could be the result of application problems, operating system problems, or other problems that might not be directly related to the Azure platform.
Also, keep in mind that some retransmission and duplicate ACKs are normal on a network. TCP protocols were built to be reliable. Evidence of these TCP packets in a packet capture doesn't necessarily indicate a systemic network problem, unless they're excessive. Still, these packet types are indications that TCP throughput isn't achieving its maximum performance, for reasons discussed in other sections of this article.

-- Based on the results from the above tests, you should get some idea regarding the cause for the low throughput that you are seeing. Hope this helps. Please let us know if you have any further questions and we will be glad to assist further.

Please also refer to this TCP/IP performance tuning link which mentioned more details including the above tests in the Azure docs. Thank you!

SaiKishor-MSFT 17,336 Reputation points

2020-12-16T18:46:01.913+00:00

@VJ-8370

Please let us know if you have any further questions/concerns. Thank you!

Remember:

Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

Want a reminder to come back and check responses? Here is how to subscribe to a notification.

Answer 2

Performance choke-points for transferring data can generally be found in 3 places:

Storage reads - How fast can data be read off the source storage location?
Network transfer - What is the slowest link in the network path (source VM NIC, network links, destination server NIC, etc.)?
Storage writes - How fast can data be written to the destination storage location?

Each of the 3 areas can, and should, be evaluated separately. Your title says "Network Throughput", and SaiKishor-MSFT has a really good answer for networking. But your description doesn't say you've narrowed it down to the network. Remember, even with good networking throughput you can still have overall poor performance because you can't get the data out of, or into, your storage fast enough.

You can use dd (linux) to test storage read/write performance. If you're using windows, maybe download one of the many free disk performance tools.

The I/O bandwidth, and IOPS, are driven by two things: VM size and disk size. You may look at https://learn.microsoft.com/en-us/azure/virtual-machines/premium-storage-performance for an overview. Personally, I've had to create very oversized VMs and disks in the past just because I needed high performance for small datasets.

Fortunately, if storage IOPS is your problem you should be able to increase your VM or expand your disks to get higher tier service.

Share via

Network throughput around 100Mbps only within transfers from Azure VMs with robocopy

1 additional answer

Your answer