Has anyone noticed azure web app service plans starting many instances until it settles on one working instance?

Adam Smith 0 Reputation points
2024-07-24T13:06:42.6066667+00:00

We've been noticing this strange behavior in our systems to the point that we had to dig deeper.

What this image shows is a single app plan with 9 very small identical apps under load. We see 24 different app plan Instances being rotated in and out of service on an app-by-app basis (this image shows one particular app moving between instances). This causes instability as the app is taken down between moves, or is still in load as the load balancer is slow to remove instances.
This particular instance lasted 7.5 hours with the apps under constant load.

From testings I have determined

  • The greater the load, and/or the more apps restarted the better the chance of a restart cycle happening.
  • The greater the load the longer it takes for the restart cycle to self-recover.
  • It will still enter a cycle with the web app healthchecks turned on, but recovery time with them off is significantly better.
  • Entered the restart cycle with .Net framework apps as well as an azure custom image tutorial container running python Django https://learn.microsoft.com/en-us/azure/app-service/tutorial-custom-container?tabs=azure-cli&pivots=container-linux.
  • The apps are stable, load alone won’t cause apps to restart
  • Tested outside of our private endpoint network on very minimal configuration internet facing webapps, still entered a restart cycle
  • Removing the app service Health Checks dramatically reduces the time taken to self-recover, but does not eliminate the instance cycling entirely.
  • This has happened on v2 and v3 app plans
    testplancycleJPG

This particular test the plan is setup for a maximum of one instance. I know production recommendations are for multiple instances, but we've seen this same behavior on multiple instances where one of the two gets in this cycle and the apps as a whole become unstable due to it.

Reading from the app service documentation located here: https://learn.microsoft.com/en-us/azure/app-service/monitor-instances-health-check?tabs=dotnet There are some key take aways from that doc:

  1. If the web app that's running on an instance remains unhealthy for one hour, the instance is replaced with a new one.
  2. At most one instance will be replaced per hour, with a maximum of three instances per day per App Service Plan
  3. The App Service plan can have a maximum of one unhealthy instance replaced per hour and, at most, three instances per day.
  4. There's a nonconfigurable limit on the total number of instances replaced by Health Check per scale unit. If this limit is reached, no unhealthy instances are replaced. This value gets reset every 12 hours.
  5. When an app on an instance remains unhealthy for over one hour, the instance will only be replaced if all other apps with Health check enabled are also unhealthy
  6. In the scenario where all instances of your application are unhealthy, App Service will not remove instances from the load balancer
  7. If your app is only scaled to one instance and becomes unhealthy, it will not be removed from the load balancer because that would take down your application entirely. However, after one hour of continuous unhealthy pings, the instance is replaced

From the above I can infer that:

 

If my app is unhealthy, it remains in load because it is running on a single instance (statement 6 & 7) for an hour before the instance is restarted (statement 1 & 7).  If it is still unhealthy this could happen again twice more (total 3) within a day (statements 2 and 3), or maybe 12-hour period (statement 4, this is unclear what a scale unit is or why the 12-hour reset rather than day reset).  However, since I have other apps on that instance the replacement shouldn’t happen (statement 5).

I'm unable to square my observations with the official documentation on how instances should and should not restart.

And to add, our apps were running in this configuration for several months before the issue started happening. We went back over past metrics and found that it started occurring mid-June.

Many thanks,
Adam

Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
{count} votes

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.