Reliability in Azure Functions
This article describes reliability support in Azure Functions, and covers both intra-regional resiliency with availability zones and cross-region recovery and business continuity. For a more detailed overview of reliability principles in Azure, see Azure reliability.
Availability zone support for Azure Functions is available on both Premium (Elastic Premium) and Dedicated (App Service) plans. This article focuses on zone redundancy support for Premium plans. For zone redundancy on Dedicated plans, see Migrate App Service to availability zone support.
Availability zone support
Azure availability zones are at least three physically separate groups of datacenters within each Azure region. Datacenters within each zone are equipped with independent power, cooling, and networking infrastructure. In the case of a local zone failure, availability zones are designed so that if the one zone is affected, regional services, capacity, and high availability are supported by the remaining two zones.
Failures can range from software and hardware failures to events such as earthquakes, floods, and fires. Tolerance to failures is achieved with redundancy and logical isolation of Azure services. For more detailed information on availability zones in Azure, see Regions and availability zones.
Azure availability zones-enabled services are designed to provide the right level of reliability and flexibility. They can be configured in two ways. They can be either zone redundant, with automatic replication across zones, or zonal, with instances pinned to a specific zone. You can also combine these approaches. For more information on zonal vs. zone-redundant architecture, see Recommendations for using availability zones and regions.
Azure Functions supports both zone-redundant and zonal instances.
Zonal. Function app instances are placed in a single zone that's selected by the platform in the selected region. A zonal function app is isolated from any outages that occur in other zones. However, if an outage impacts the specific zone chosen for the function app, the function app won't be available.
Zone-redundant. The function app platform automatically spreads the instances in the plan across all zones of the selected region. For example, in a region with three zones, if an instance count is larger than three and the number of instances is divisible by three, the instances are distributed evenly. Otherwise, instance counts beyond 3 * N are distributed across the remaining one or two zones. A zone redundant function app automatically distributes the instances your app runs on between the availability zones in the region. For apps running in a zone-redundant Premium plan, even as the app scales in and out, the instances the app is running on are still evenly distributed between availability zones.
Azure Functions can run on the Azure App Service platform. In the App Service platform, plans that host Premium plan function apps are referred to as Elastic Premium plans, with SKU names like EP1. If you choose to run your function app on a Premium plan, make sure to create a plan with an SKU name that starts with "E", such as EP1. App Service plan SKU names that start with "P", such as P1V2 (Premium V2 Small plan), are actually Dedicated hosting plans. Because they are Dedicated and not Elastic Premium, plans with SKU names starting with "P" won't scale dynamically and may increase your costs.
Zone-redundant Premium plans are available in the following regions:
|Americas||Europe||Middle East||Africa||Asia Pacific|
|Brazil South||France Central||Qatar Central||South Africa North||Australia East|
|Canada Central||Germany West Central||UAE North||Central India|
|Central US||North Europe||China North 3|
|East US||Norway East||East Asia|
|East US 2||Sweden Central||Japan East|
|South Central US||Switzerland North||Southeast Asia|
|West US 2||UK South|
|West US 3||West Europe|
Availability zone support is a property of the Premium plan. The following are the current requirements/limitations for enabling availability zones:
- You can only enable availability zones when creating a Premium plan for your function app. You can't convert an existing Premium plan to use availability zones.
- You must use a zone redundant storage account (ZRS) for your function app's storage account. If you use a different type of storage account, Functions may show unexpected behavior during a zonal outage.
- Both Windows and Linux are supported.
- Must be hosted on an Elastic Premium or Dedicated hosting plan. To learn how to use zone redundancy with a Dedicated plan, see Migrate App Service to availability zone support.
- Availability zone support isn't currently available for function apps on Consumption plans.
- Function apps hosted on a Premium plan must have a minimum always ready instances count of three.
- The platform will enforce this minimum count behind the scenes if you specify an instance count fewer than three.
- If you aren't using Premium plan or a scale unit that supports availability zones, are in an unsupported region, or are unsure, see the migration guidance.
There's no additional cost associated with enabling availability zones. Pricing for a zone redundant Premium plan is the same as a single zone Premium plan. You'll be charged based on your Premium plan SKU, the capacity you specify, and any instances you scale to based on your autoscale criteria. If you enable availability zones but specify a capacity less than three, the platform will enforce a minimum instance count of three and charge you for those three instances.
Create a zone-redundant Premium plan and function app
There are currently two ways to deploy a zone-redundant Premium plan and function app. You can use either the Azure portal or an ARM template.
Open the Azure portal and navigate to the Create Function App page. Information on creating a function app in the portal can be found here.
In the Basics page, fill out the fields for your function app. Pay special attention to the fields in the table below (also highlighted in the screenshot below), which have specific requirements for zone redundancy.
Setting Suggested value Notes for Zone Redundancy Region Preferred region The subscription under which this new function app is created. You must pick a region that is availability zone enabled from the list above.
In the Hosting page, fill out the fields for your function app hosting plan. Pay special attention to the fields in the table below (also highlighted in the screenshot below), which have specific requirements for zone redundancy.
Setting Suggested value Notes for Zone Redundancy Storage Account A zone-redundant storage account As mentioned above in the prerequisites section, we strongly recommend using a zone-redundant storage account for your zone redundant function app. Plan Type Functions Premium This article details how to create a zone redundant app in a Premium plan. Zone redundancy isn't currently available in Consumption plans. Information on zone redundancy on app service plans can be found in this article. Zone Redundancy Enabled This field populates the flag that determines if your app is zone redundant or not. You won't be able to select
Enabledunless you have chosen a region supporting zone redundancy, as mentioned in step 2.
For the rest of the function app creation process, create your function app as normal. There are no fields in the rest of the creation process that affect zone redundancy.
After the zone-redundant plan is created and deployed, any function app hosted on your new plan is considered zone-redundant.
Migrate your function app to a zone-redundant plan
Azure Function Apps currently doesn't support in-place migration of existing function apps instances. For information on how to migrate the public multi-tenant Premium plan from non-availability zone to availability zone support, see Migrate App Service to availability zone support.
Zone down experience
All available function app instances of zone-redundant function apps are enabled and processing events. When a zone goes down, Functions detect lost instances and automatically attempts to find new replacement instances if needed. Elastic scale behavior still applies. However, in a zone-down scenario there's no guarantee that requests for additional instances can succeed, since back-filling lost instances occurs on a best-effort basis. Applications that are deployed in an availability zone enabled Premium plan continue to run even when other zones in the same region suffer an outage. However, it's possible that non-runtime behaviors could still be impacted from an outage in other availability zones. These impacted behaviors can include Premium plan scaling, application creation, application configuration, and application publishing. Zone redundancy for Premium plans only guarantees continued uptime for deployed applications.
When Functions allocates instances to a zone redundant Premium plan, it uses best effort zone balancing offered by the underlying Azure Virtual Machine Scale Sets. A Premium plan is considered balanced when each zone has either the same number of VMs (± 1 VM) in all of the other zones used by the Premium plan.
Cross-region disaster recovery and business continuity
Disaster recovery (DR) is about recovering from high-impact events, such as natural disasters or failed deployments, that result in downtime and data loss. Regardless of the cause, the best remedy for a disaster is a well-defined and tested DR plan and an application design that actively supports DR.
When it comes to DR, Microsoft uses the shared responsibility model. The shared responsibility models means that Microsoft ensures that the baseline infrastructure and platform services are available. However, in some scenarios, usage requires that the customer to duplicate their deployments and storage in a multi-region capacity, if they opt to.
Most services that run on Azure platform as a service (PaaS) offerings provide features and guidance to support DR. For some scenarios, you can use service-specific features to support fast recovery.
This section explains some of the strategies that you can use to deploy Functions to allow for disaster recovery.
Multi-region disaster recovery
Because there is no built-in redundancy available, functions run in a function app in a specific Azure region. To avoid loss of execution during outages, you can redundantly deploy the same functions to function apps in multiple regions. To learn more about multi-region deployments, see the guidance in Highly available multi-region web application.
Active-active pattern for HTTP trigger functions
With an active-active pattern, functions in both regions are actively running and processing events, either in a duplicate manner or in rotation. It's recommended that you use an active-active pattern in combination with Azure Front Door for your critical HTTP triggered functions, which can route and round-robin HTTP requests between functions running in multiple regions. Front door can also periodically checks the health of each endpoint. When a function in one region stops responding to health checks, Azure Front Door takes it out of rotation, and only forwards traffic to the remaining healthy functions.
Although, it's highly recommended that you use the active-passive pattern for non-HTTPS trigger functions. You can create active-active deployments for non-HTTP triggered functions. However, you need to consider how the two active regions interact or coordinate with one another. When you deploy the same function app to two regions with each triggering on the same Service Bus queue, they would act as competing consumers on de-queueing that queue. While this means each message is only being processed by either one of the instances, it also means there's still a single point of failure on the single Service Bus instance.
You could instead deploy two Service Bus queues, with one in a primary region, one in a secondary region. In this case, you could have two function apps, with each pointed to the Service Bus queue active in their region. The challenge with this topology is how the queue messages are distributed between the two regions. Often, this means that each publisher attempts to publish a message to both regions, and each message is processed by both active function apps. While this creates the desired active/active pattern, it also creates other challenges around duplication of compute and when or how data is consolidated.
Active-passive pattern for non-HTTPS trigger functions
It's recommended that you use active-passive pattern for your event-driven, non-HTTP triggered functions, such as Service Bus and Event Hubs triggered functions.
To create redundancy for non-HTTP trigger functions, use an active-passive pattern. With an active-passive pattern, functions run actively in the region that's receiving events; while the same functions in a second region remain idle. The active-passive pattern provides a way for only a single function to process each message while providing a mechanism to fail over to the secondary region in a disaster. Function apps work with the failover behaviors of the partner services, such as Azure Service Bus geo-recovery and Azure Event Hubs geo-recovery.
Consider an example topology using an Azure Event Hubs trigger. In this case, the active/passive pattern requires involve the following components:
- Azure Event Hubs deployed to both a primary and secondary region.
- Geo-disaster enabled to pair the primary and secondary event hubs. This also creates an alias you can use to connect to event hubs and switch from primary to secondary without changing the connection info.
- Function apps are deployed to both the primary and secondary (failover) region, with the app in the secondary region essentially being idle because messages aren't being sent there.
- Function app triggers on the direct (non-alias) connection string for its respective event hub.
- Publishers to the event hub should publish to the alias connection string.
Before failover, publishers sending to the shared alias route to the primary event hub. The primary function app is listening exclusively to the primary event hub. The secondary function app is passive and idle. As soon as failover is initiated, publishers sending to the shared alias are routed to the secondary event hub. The secondary function app now becomes active and starts triggering automatically. Effective failover to a secondary region can be driven entirely from the event hub, with the functions becoming active only when the respective event hub is active.