There seems to be low outbound bandwidth for resources(web apps, app services) on one of our app service plans.
We have seen recently(over the last 6 weeks) that the loading time from a browser is very slow. This is happening to all of our app services that have the same external virtual IP address.
The initial loading of the js chunks can take more then 30 seconds. They are bigger then we would like but not enough to explain the performance. (one chunk was 2.5 megabytes)
This is intermittent and does not correlate with high cpu, memory, or # requests, etc on the app services. The latest time has been going on for a couple of days.
Any ideas/suggestions?
thank you
Azure App Service
-
Shree Hima Bindu Maganti • 4,155 Reputation points • Microsoft External Staff • Moderator
2025-04-17T17:11:18.0133333+00:00 Hi @Sam Skrivan
It seems you are experiencing intermittent slow loading times for your app services,
- Since the problem affects all app services with the same external virtual IP address, it could be related to network connectivity or bandwidth limitations. Consider running availability tests to check for any network issues in the regions where your users are located.
- Use dependency tracking to determine if the slowness is due to external services or databases. If your app relies on external APIs or databases, their performance can impact your app's loading times.
- While you mentioned that the size of your JS chunks is not excessively large, consider optimizing them further. Techniques like code splitting or lazy loading can help reduce the initial load time.
- If you haven't already, enable the "Always On" setting in your app service configuration. This can help keep your app warm and reduce delays after idle times, especially in Basic and higher plans.
- Configure a health check path to monitor the health of your app services. This can help identify unresponsive instances and maintain availability and performance.
- Even if CPU and memory usage are not high, ensure that your app service plan has adequate resources for your traffic and application demands. Troubleshoot slow app performance issues in Azure App Service Application performance FAQs for Web Apps in Azure Smart detection - Performance Anomalies Troubleshooting intermittent outbound connection errors in Azure App Service
Let me know if you have any further assistances.
-
Sam Skrivan • 0 Reputation points
2025-04-17T17:52:04.49+00:00 We have a staging environment with identical code and that is loading(downloading the js chunks) very fast, and production is still VERY slow.
This feels like an Azure networking or infrastructure issue.
On staging the largest chunk(2.3mb) takes 563.33 ms
Production for that chunk takes 33.88 SECONDS. And all of that is in the content download time.
-
TP • 119.1K Reputation points • Moderator
2025-04-17T19:32:50.4166667+00:00 Hi Sam,
Some questions to help troubleshooting. If I have abbreviated too much and you unsure, let me know:
- What SKU are you using, how many instances, OS (windows or linux), stack + version, code or container?
- On your worker(s), what server is serving static files? (e.g. NGINX)
- Do you have your webapp set to HTTP 1.1 or 2.0?
- If you download the .js file from VM in same Azure region as your webapp, does it still take 30+ seconds?
- Does the slow transfer issue go away if you restart app?
- Does the slow transfer issue go away if you restart the worker(s)?
Thanks.
-
Sam Skrivan • 0 Reputation points
2025-04-17T21:17:40.9433333+00:00 Thanks TP. answers below
- the SKU is Premium 1 V3, Linux, Node 22, code
- PM2
- HTPP 1.1
- Haven't tried that, but we do see good performance with the same setup for our staging, environment, which is hosted in the same region(different virtual ip)
- No
- No
-
TP • 119.1K Reputation points • Moderator
2025-04-17T22:01:15.0266667+00:00 How many instances? Do you only run single worker instance, or multiple, or some sort of auto scale out, or what? I put a ton of questions all within #1 :)
4. Haven't tried that, but we do see good performance with the same setup for our staging, environment, which is hosted in the same region(different virtual ip)
If you get a chance give it a try. You can spin up smalldisk windows server 2022 Spot and conduct test using Edge in about 15 minutes, for total cost, say less than $0.01. What I'm going for here is to test if the different Azure networking behavior has any effect. Connections from within same region are treated differently.
I'm not thinking in terms of shaving few milliseconds latency somehow fixing it, rather, I want to see if perhaps it is frontend-related and this difference somehow "fixes" it. I'm not optimistic this test will reveal anything, but it could, and it's quick/easy so worth it.
#5 and #6
I realize this is disruptive, so may need to wait until you are okay with brief downtime. What I'm going for here is attempting to narrow down cause to the workers. It is possible that there is an issue that occurs within the workers after some amount of time/use.
If slowdown is resolved by restarting workers a) we know it is something with worker and not the frontend layer and b) you now have a known workaround that you can implement until you have permanent fix.
Go ahead and restart all worker(s) using Advanced Application Restart or the REST API. You should conduct #4 (and any other tests you are thinking of) before restarting the workers since the problem might go away after.
-
TP • 119.1K Reputation points • Moderator
2025-04-17T22:16:47.77+00:00 Sam if you want I can do #4 test, assuming you can give me direct url link to the .js file. You can post url to .js file as screenshot so no bots will pick it up, and then we can delete the image.
-
Sam Skrivan • 0 Reputation points
2025-04-17T22:57:57.9833333+00:00 Thanks for TP.
for #5 & #6, i did try this but it didn't have an effect.
For #4 test, i will check with the team.
-s
-
TP • 119.1K Reputation points • Moderator
2025-04-17T23:11:28.0433333+00:00 for #5 & #6, i did try this but it didn't have an effect.
To confirm, you used Advanced Application Restart and restarted all workers?Something else easy you can try, navigate to Scale up (App Service plan), switch to P1mv3, test, then switch it back to P1V3, test again.
-
TP • 119.1K Reputation points • Moderator
2025-04-18T00:33:36.17+00:00 I recommend you capture a network trace on your production environment as well as your staging (for comparison) and then use wireshark to examine what is actually happening.
There is a technique you can use to make things a little easier if your production site has a time period with no or very little activity. You can get everything ready (your browser, ssh to worker, etc.) for the capture, next go into Networking and set rules so that only your local PC's public IP address is allowed, start capture on worker, download .js file to your PC, stop capture, and switch webapp firewall rule back to allow everything. If you do things right your production site would only be unavailable for less than 5 minutes or so.
To start capture in the worker you would ssh to it via portal run command similar to below:
tcpdump -w /tmp/0001.pcap -i eth0
To stop capture press Ctrl-C. You can use ftp or other method to transfer 0001.pcap file to your local PC so you can open it with wireshark. The idea behind above technique is to only capture bare minimum and mostly only include traffic from your local PC.
You will be able to see the communication between the worker and frontend to diagnose what is actually occurring.
Please let me know if you have any questions.
Thanks.
-
Alekhya Vaddepally • 1,170 Reputation points • Microsoft External Staff • Moderator
2025-04-21T09:04:08.9166667+00:00 Hi Sam Skrivan,
Just checking in to see if the above answer provided by TP helped.
In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
Sam Skrivan • 0 Reputation points
2025-04-21T16:05:16.83+00:00 Thanks for the help TP. The transient nature of this issue is rearing its head. The production environment was loading within a second or two starting friday and over the weekend.
It is back to slow this morning, so i will attempt the capture you described this evening.
-
Alekhya Vaddepally • 1,170 Reputation points • Microsoft External Staff • Moderator
2025-04-22T08:33:04.2433333+00:00 Hi Sam Skrivan,
Just checking in to see if the above answer provided by TP helped.
In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
Sampath • 2,925 Reputation points • Microsoft External Staff • Moderator
2025-04-24T10:34:55.09+00:00 Hello @Sam Skrivan,
Just checking in to see if the above answer provided by TP helped.
In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
-
TP • 119.1K Reputation points • Moderator
2025-04-28T19:51:56.6633333+00:00 @Sam Skrivan Any update? Just checking in
-
Sam Skrivan • 0 Reputation points
2025-04-30T17:29:37.6966667+00:00 We are still experiencing this problem. We will capture the network logs and post here.
Sign in to comment