Recommended App Service minimum instance count

Gintaras Bubelevicius 21 Reputation points
2022-01-24T18:28:22.317+00:00

On the Azure portal for our app service, we've recently seen a recommendation to scale out to 3 instances (we're currently running 2 as minimum + autoscale rules). However, we're not able to find any official documentation on how app service instances are getting distributed and how that would prevent any downtime. The recommendation also states that "since you have only two instances you can expect a downtime of upto 50% when the App Service platform is upgraded".

167947-recommendation.png

Have a few questions in this particular scenario:

  1. When the platform upgrade happens and one of the instances starts the upgrade process, isn't that instance supposed to be brought down and the remaining 1 instance would handle all of the requests, given that the load balancer is supposed to treat the instance as not available and direct all traffic to the remaining 1 instance? That would technically still serve any incoming requests and prevent downtime or is that not the case? Need an explanation.
  2. How does the instance distribution happen? If we run 3 instances as a minimum is the 3rd instance hosted on a different server? Is that always the case?
  3. If the 3rd instance is in a different server and one of the instances starts the upgrade process, doesn't that still give us a 75% success rate? How does that prevent any downtime?
  4. How long does the platform upgrade process actually last on average? How long would the downtime be or is it just a matter of the application restart?

Any links to official documentation are welcome, thank you.

Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
7,048 questions
{count} votes

Accepted answer
  1. brtrach-MSFT 15,356 Reputation points Microsoft Employee
    2022-01-27T06:08:40.053+00:00

    Hi @Gintaras Bubelevicius . Thank you for your question and sharing the screenshot to help us easily understand the request.

    Let us try to answer your questions.

    1. The concern is that if you were to start a deployment, your one instance is receiving an OS and maintenance upgrades while your second instance is being updated with your new files via your deployment. Since some deployments require a reboot, this could leave you left exposed. Since updates can be applied at any time and their schedule is not shared with the public, it's not possible to schedule your deployments when the updates are not happening. This of course would be a rare scenario but if you are in pursuit of that 99.95% uptime SLA that you get with a basic tier or higher app, this is recommended.
    2. Your 3 instances would be placed within 3 different upgrade domains to ensure you never two instances going down at the same time for an upgrade. The logic of the upgrade system is to ensure the instances remain in separate upgrade domains and each region has many upgrade domains to ensure this.
    3. This we believe should have been answered in step one.
    4. This is not a metric that is shared publicly. This is because some upgrades take longer than others. Also, if there are errors reported with X.0 upgrade, the product group will develop and release a X.1 upgrade to address the issue and start the upgrade process over again in rare scenarios.

    Unfortunately we do not have any documentation to share with you as a lot of this relates to the underlying architecture of the platform and what we have shared with you today is from working with the product and product team over the past 7+ years. These details are not made fully public for a number of reasons and only shared on a as needed basis with customers for now.

    Another item I want to point out is that if high availability is a concern for you, then you need to also think multi-region. Think worst case scenario such as a natural disaster taking the one data center offline that you are using. It would be best to have something like Azure Traffic Manager (ATM) in front of two web apps (each with 3+ instances) and if ATM detects one of your web app regions is offline, it can reroute traffic to minimize downtime. Customer's who are running storefronts and the availability of their app is tied directly to their app being online will often take this approach.

    We do have two multi region docs we would like to share with you:

    We hope this provides you with the clarity you were looking for. Please let us know if you have any further questions or concerns regarding this matter.

    4 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful