Network throughput around 100Mbps only within transfers from Azure VMs with robocopy

VJ-8370 551 Reputation points
2020-12-10T07:05:25.577+00:00

Hi,

We have to transfer some data in TBs within Azure VMs and from Azure VM to NetApp CVO. Based on test transfers we have observed that the throughput is approximately 100 Mbps only (when tested with 100K+ plus files. Actual data will have millions of files)

Any pointers where the issue could be so that data transfer can happen quickly from Azure VMs to NetApp CVO and between Azure VMs. We need to retain ACLs, timestamps of all the files.

Regards,
VJ

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,051 questions
0 comments No comments
{count} votes

Accepted answer
  1. SaiKishor-MSFT 17,336 Reputation points
    2020-12-11T18:28:29.153+00:00

    @VJ-8370

    To troubleshoot throughput issues here some steps that I would suggest-

    1. Please check the bandwidth threshold of the Azure VM that you are using and try changing the VM type/size that can accomodate more BW. More details about the VMs and their BW is given here in this link. Please click on the type of VM that you have and then click on the series which should show the table with all the different sizes and their expected BW.
    2. Traceroute is a good tool for measuring network performance characteristics (like packet loss and latency) along every network path between a source device and a destination device. Do a traceroute test from the source to destination and destination to source both ways and check for any latency along the path.
    3. Ping test can be used to check for any loss between the source and destination if any. You can try a regular ICMP ping test and also a TCP based test using Psping.
    4. NTttcp is a tool for testing the TCP performance of a Linux or Windows VM. You can change various TCP settings and then test the benefits by using NTttcp. For more information, see these resources: Bandwidth/Throughput testing (NTttcp) NTttcp Utility
    5. You can also use Iperf3 to measure the actual bandwidth achieved between the source and destination. One of the side acts as server and the other side acts as client (i.e., the source and destination). Please make sure that the ports being used are open in the firewalls/NSGs while testing. Iperf default uses port Here are some commands for Iperf3 test:

    On the server side: "iperf3 -s -V"
    On the client side:

    1. 30 parallel TCP streams: iperf3 -c <IP of Azure VM or On-Prem host> -P 30 -t 30
    2. 1 Gbps UDP test: iperf3 -c <IP of Azure VM or On-Prem host> -u -b 1G -t 30

    You can further alter the windows sizes and test as shown below-

    1. Window size 128K: iperf3 -c <IP of Azure VM or On-Prem host> -w 128K -t 30
    2. Window size 512K: iperf3 -c <IP of Azure VM or On-Prem host> -w 512K -t 30
    3. Window size 1024K: iperf3 -c <IP of Azure VM or On-Prem host> -w 1024K -t 30

    You can also further perform some Packet captures to look for any unexpected behaviors such as seeing TCP packets with TCP flags (SACK, DUP ACK, RETRANSMIT, and FAST RETRANSMIT) that could indicate network performance problems. These packets specifically indicate network inefficiencies that result from packet loss. But packet loss isn't necessarily caused by Azure performance problems. Performance problems could be the result of application problems, operating system problems, or other problems that might not be directly related to the Azure platform.
    Also, keep in mind that some retransmission and duplicate ACKs are normal on a network. TCP protocols were built to be reliable. Evidence of these TCP packets in a packet capture doesn't necessarily indicate a systemic network problem, unless they're excessive. Still, these packet types are indications that TCP throughput isn't achieving its maximum performance, for reasons discussed in other sections of this article.

    -- Based on the results from the above tests, you should get some idea regarding the cause for the low throughput that you are seeing. Hope this helps. Please let us know if you have any further questions and we will be glad to assist further.

    Please also refer to this TCP/IP performance tuning link which mentioned more details including the above tests in the Azure docs. Thank you!

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. D.D 86 Reputation points
    2020-12-22T16:21:00.9+00:00

    Performance choke-points for transferring data can generally be found in 3 places:

    1. Storage reads - How fast can data be read off the source storage location?
    2. Network transfer - What is the slowest link in the network path (source VM NIC, network links, destination server NIC, etc.)?
    3. Storage writes - How fast can data be written to the destination storage location?

    Each of the 3 areas can, and should, be evaluated separately. Your title says "Network Throughput", and SaiKishor-MSFT has a really good answer for networking. But your description doesn't say you've narrowed it down to the network. Remember, even with good networking throughput you can still have overall poor performance because you can't get the data out of, or into, your storage fast enough.

    You can use dd (linux) to test storage read/write performance. If you're using windows, maybe download one of the many free disk performance tools.

    The I/O bandwidth, and IOPS, are driven by two things: VM size and disk size. You may look at https://learn.microsoft.com/en-us/azure/virtual-machines/premium-storage-performance for an overview. Personally, I've had to create very oversized VMs and disks in the past just because I needed high performance for small datasets.

    Fortunately, if storage IOPS is your problem you should be able to increase your VM or expand your disks to get higher tier service.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.