Extreamly slow Upload speed in Windows (all other OS's on network are fine)

Ian Turner 11 Reputation points
2021-03-24T17:40:31.563+00:00

I've been having this issue since at least January 2021 on my entire fleet of Windows devices. I first noticed it when my offsite backups stopped completing in time.

Upload speed in Windows is being throttled by something. Download speed is unaffected.

WAN is 2Gbps symmetrical. My ISP (Washington State K-20 Telecommunications Network) confirmed that their circuit is not the cause and is capable of 1800 Mbps symmetrical throughput. Distance is a factor, as I can get 400Mbps to my local telco (which isn't my IPS). Going past a few hundred kilometers it drops to 30-70Mbps. There is an initial burst, but drops quickly.

I have an HPE/Aruba network and a Sophos XG 310 v2 running SFOS 18.0.4 MR-4. If I plug a client directly into my 10Gbit fiber before my firewall, I can get acceptable speeds on Windows. I haven't been able to find any setting in Sophos XG to tweak that would make any difference. Local iPerf3 tests ruled out my core router/switch/datacenter.

All tests run from Hyper-V guests on Server 2019 Datacenter running on HPE ProLiant DL360 Gen10 hardware.
Windows Server 2019 Datacenter speed tests:
81206-image.png
81275-image.png

Linux (CentOS 8) speed tests:

Speedtest by Ookla  
  
     Server: Comcast - Seattle, WA (id = 1782)  
        ISP: Washington State K-20 Telecommunications Network  
    Latency:     3.93 ms   (0.16 ms jitter)  
   Download:   915.29 Mbps (data used: 1.2 GB)  
     Upload:  1537.96 Mbps (data used: 1.8 GB)  
Packet Loss: Not available.  
 Result URL: https://www.speedtest.net/result/c/c4bca417-e246-4f46-964a-c4291e4a3914  

Speedtest by Ookla  
  
     Server: Comcast - Sacramento, CA (id = 9436)  
        ISP: Washington State K-20 Telecommunications Network  
    Latency:    24.65 ms   (0.13 ms jitter)  
   Download:  1232.96 Mbps (data used: 1.7 GB)  
     Upload:  1007.46 Mbps (data used: 1.3 GB)  
Packet Loss:     0.4%  
 Result URL: https://www.speedtest.net/result/c/21032a9c-8285-44fe-aadf-ad4dc3d90428  

OS affected for me:

  • Windows 10 2004
  • Windows 10 20H2
  • Windows 2016
  • Windows 2019

All devices are fully updated, firmware included.

I've tweaked:

  • Limit reservable bandwidth
  • AV
  • Safe mode boot
  • Domain and non-domain computers
  • autotuning
  • Interrupt Moderation
  • Receive Side Scaling
  • TCP Congestion Control
  • Large Send Offload

I've tried the following hardware:

  • Dell Optiplex 7040
  • HP Elitebook 840 G5
  • HPE ProLiant DL380 Gen10
  • HPE ProLiant DL360 Gen10

These OS' are fine:

  • ChromeOS
  • Android
  • MacOS
  • iOS
  • Linux (CentOS, HyperV)

This is a continuation of https://learn.microsoft.com/en-us/answers/questions/89768/slow-wired-upload-speed-vs-linux-on-same-hardware.html

Windows Server
Windows Server
A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.
12,170 questions
Windows 10 Network
Windows 10 Network
Windows 10: A Microsoft operating system that runs on personal computers and tablets.Network: A group of devices that communicate either wirelessly or via a physical connection.
2,274 questions
0 comments No comments
{count} votes

16 answers

Sort by: Oldest
  1. Gary Nebbett 5,721 Reputation points
    2021-03-24T20:31:07.527+00:00

    Hello @Ian Turner ,

    I guess that you read what I wrote about regarding out-of-order delivery undermining the Windows congestion control mechanisms. Have you made any measurements to check to what extent this might explain the behaviour that you are observing?

    Gary

    0 comments No comments

  2. Ian Turner 11 Reputation points
    2021-03-24T22:25:35.83+00:00

    Hi @Gary Nebbett ,

    I did read through your responses, but I had not run any measures to check for out of order delivery yet. My suspicion is on my Sophos XG firewall which would be why I can get acceptable speeds directly connected to my WAN connection on a client computer. I have an open ticket with Sophos and will bring up Out of Order delivery with them.

    I think I tried all the suggested things, unless I missed something. I used to get normal speeds during upload. I do not have a record of what was changed, either a firewall update, Windows update or something else.

     New-NetEventSession -LocalFilePath $Env:TEMP\SlowUp.etl -Name SlowUp  
     Add-NetEventPacketCaptureProvider -TruncationLength 100 -Level 255 -SessionName SlowUp  
     Add-NetEventProvider -Name "Microsoft-Windows-TCPIP" -Level 255 -SessionName SlowUp  
     Start-NetEventSession -Name SlowUp  
          
     [run performance test]  
          
     Stop-NetEventSession -Name SlowUp  
     Remove-NetEventSession  
    

    Upload event:

    ThreadID="10,992" ProcessorNumber="16" Tcb="0xffffb88b35872b20" DataBytesOut="19,242,849" DataBytesIn="13,306" DataSegmentsOut="6,359" DataSegmentsIn="15" SegmentsOut="6,369" SegmentsIn="7,480" NonRecovDa="1,436" NonRecovDaEpisodes="1,325" DupAcksIn="1,989" BytesRetrans="124,100" Timeouts="0" SpuriousRtoDetections="0" FastRetran="66" MaxSsthresh="501,694" MaxSsCwnd="716,706" MaxCaCwnd="507,015" SndLimTransRwin="3" SndLimTimeRwin="21" SndLimBytesRwin="86,140" SndLimTransCwnd="5" SndLimTimeCwnd="16,434" SndLimBytesCwnd="13,408,300" SndLimTransSnd="5" SndLimTimeRSnd="165" SndLimBytesRSnd="497,253" ConnectionTimeMs="16,643" TimestampsEnabled="FALSE" RttUs="24,261" MinRttUs="482" MaxRttUs="45,838" SynRetrans="0" CongestionAlgorithm="CUBIC" State="ClosedState" LocalAddress="10.1.3.146:55306" RemoteAddress="69.241.21.18:8080" CWnd="24,404" SsThresh="14,600" RcvWnd="261,479" RcvBuf="262,800" SndWnd="2,474,880" FormattedMessage="TCP: Connection 0xffffb88b35872b20 Summary: DataBytesOut 19,242,849 DataBytesIn 13,306 DataSegmentsOut 6,359 DataSegmentsIn 15 SegmentsOut 6,369 SegmentsIn 7,480 NonRecovDa \   1,436 NonRecovDaEpisodes 1,325 DupAcksIn 1,989 BytesRetrans 124,100 Timeouts 0 SpuriousRtoDetections 0 FastRetran 66 MaxSsthresh 501,694 MaxSsCwnd 716,706 \   MaxCaCwnd 507,015 SndLimTransRwin 3 SndLimTimeRwin 21 SndLimBytesRwin 86,140 SndLimTransCwnd 5 SndLimTimeCwnd 16,434 SndLimBytesCwnd 13,408,300 \   SndLimTransSnd 5 SndLimTimeSnd 165 SndLimBytesSnd 497,253 ConnectionTimeMs 16,643 Timestamps FALSE RttUs 24,261 MinRtt 482 MaxRtt 45,838 SynRetrans 0 CongestionAlgorithm CUBIC \   State ClosedState Local 10.1.3.146:55306 Remote 69.241.21.18:8080 CWnd 24,404 SsThresh 14,600 RcvWnd 261,479 RcvBuf 262,800 SndWnd 2,474,880. " ActivityID="35872b20-b88b-ffff-0000-000000000000"    
    

    SlowUp.etl: https://drive.google.com/file/d/1DUC9if8uDLiijNq_k0HrER-cUss115Ja/view

    Thank you for helping,
    Ian

    0 comments No comments

  3. Sunny Qi 10,906 Reputation points Microsoft Vendor
    2021-03-25T09:30:04.113+00:00

    Hi ,

    Thanks for posting in Q&A platform.

    In your case ,we need to analyze performance log to find any clues. Unfortunately, analysis of performance log is beyond our forum support level and due to forum security policy, we have no such channel to collect user log information. So if you want to find the cause ,we recommend you open a case with MS Professional tech support service, they will help you open a phone or email case to Microsoft, so that you would get a technical support on a one-to-one basis while ensuring private information.

    Here is the link:

    https://support.microsoft.com/en-us/gp/customer-service-phone-numbers

    Best Regards,
    Sunny

    ----------

    If the Answer is helpful, please click "Accept Answer" and upvote it.

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  4. Gary Nebbett 5,721 Reputation points
    2021-03-26T15:05:47.92+00:00

    Hello @Ian Turner ,

    There are certainly examples of very out-of-order delivery triggering unnecessary retransmissions and reduction of the congestion window in your trace data. The presence of D-SACK data (RFC 2883) in the trace leaves no doubt about this. I need to do some more work to quantify the frequency of such events and their impact on the throughput.

    I doubt that the "out-of-order" behaviour is caused by your equipment; I think it is much more likely to be caused by network devices elsewhere in the Internet.

    Gary

    0 comments No comments

  5. Gary Nebbett 5,721 Reputation points
    2021-03-31T08:37:07.677+00:00

    Hello @Ian Turner ,

    Speedtest uses several HTTP connections in parallel to test the upload speed. So far, I have just looked at one connection in detail. This connection retransmitted 85 segments, all of which were spurious (i.e. were ultimately received twice because the out-of-order delivery confounded the lost segment detection - the original segments were not lost but just arrived "late").

    The "CUBIC for Fast Long-Distance Networks draft-eggert-tcpm-rfc8312bis-01" draft RFC (dated 2 February 2021), which might update "CUBIC for Fast Long-Distance Networks" (RFC 8312) says:

    CUBIC MAY implement an algorithm to detect spurious retransmissions,
    such as DSACK [RFC3708], Forward RTO-Recovery [RFC5682] or Eifel
    [RFC3522]. Once a spurious congestion event is detected, CUBIC
    SHOULD restore the original values of above mentioned variables as
    follows if the current cwnd is lower than _prior_cwnd_. Restoring
    to the original values ensures that CUBIC's performance is similar to
    what it would be if there were no spurious losses.

    The current Windows implementation of CUBIC is not undoing the reductions in the congestion window caused by spurious retransmissions - perhaps all of your other systems either do this or use a different congestion control mechanism.

    Unless the out-of-order delivery is caused by a device under your control (unlikely), then there is not much that you can do about this.

    Gary

    0 comments No comments