Azure Container Apps manages automatic horizontal scaling through a set of declarative scaling rules. As a container app revision scales out, new instances of the revision are created on-demand. These instances are known as replicas.
Adding or editing scaling rules creates a new revision of your container app. A revision is an immutable snapshot of your container app. To learn which types of changes trigger a new revision, see revision change types.
As you define your scaling rules, it's important to consider the following items:
You aren't billed usage charges if your container app scales to zero.
Replicas that aren't processing, but remain in memory might be billed at a lower "idle" rate. For more information, see Billing.
If you want to ensure that an instance of your revision is always running, set the minimum number of replicas to 1 or higher.
Scale rules
Scaling is driven by three different categories of triggers:
HTTP: Based on the number of concurrent HTTP requests to your revision.
TCP: Based on the number of concurrent TCP connections to your revision.
Custom: Based on CPU, memory, or supported event-driven data sources such as:
Azure Service Bus
Azure Event Hubs
Apache Kafka
Redis
If you define more than one scale rule, the container app begins to scale once the first condition of any rules is met.
HTTP
With an HTTP scaling rule, you have control over the threshold of concurrent HTTP requests that determines how your container app revision scales. Every 15 seconds, the number of concurrent requests is calculated as the number of requests in the past 15 seconds divided by 15. Container Apps jobs don't support HTTP scaling rules.
In the following example, the revision scales out up to five replicas and can scale in to zero. The scaling property is set to 100 concurrent requests per second.
Example
The http section defines an HTTP scale rule.
Scale property
Description
Default value
Min value
Max value
concurrentRequests
When the number of HTTP requests exceeds this value, then another replica is added. Replicas continue to add to the pool up to the maxReplicas amount.
Set the properties.configuration.activeRevisionsMode property of the container app to single, when using non-HTTP event scale rules.
Define an HTTP scale rule using the --scale-rule-http-concurrency parameter in the create or update commands.
CLI parameter
Description
Default value
Min value
Max value
--scale-rule-http-concurrency
When the number of concurrent HTTP requests exceeds this value, then another replica is added. Replicas continue to add to the pool up to the max-replicas amount.
In the Concurrent requests box, enter your desired number of concurrent requests for your container app.
TCP
With a TCP scaling rule, you have control over the threshold of concurrent TCP connections that determines how your app scales. Every 15 seconds, the number of concurrent connections is calculated as the number of connections in the past 15 seconds divided by 15. Container Apps jobs don't support TCP scaling rules.
In the following example, the container app revision scales out up to five replicas and can scale in to zero. The scaling threshold is set to 100 concurrent connections per second.
Example
The tcp section defines a TCP scale rule.
Scale property
Description
Default value
Min value
Max value
concurrentConnections
When the number of concurrent TCP connections exceeds this value, then another replica is added. Replicas continue to be added up to the maxReplicas amount as the number of concurrent connections increase.
Define a TCP scale rule using the --scale-rule-tcp-concurrency parameter in the create or update commands.
CLI parameter
Description
Default value
Min value
Max value
--scale-rule-tcp-concurrency
When the number of concurrent TCP connections exceeds this value, then another replica is added. Replicas continue to be added up to the max-replicas amount as the number of concurrent connections increase.
The following procedure shows you how to convert a KEDA scaler to a Container App scale rule. This snippet is an excerpt of an ARM template to show you where each section fits in context of the overall template.
Container Apps scale rules support secrets-based authentication. Scale rules for Azure resources, including Azure Queue Storage, Azure Service Bus, and Azure Event Hubs, also support managed identity. Where possible, use managed identity authentication to avoid storing secrets within the app.
Use secrets
To use secrets for authentication, you need to create a secret in the container app's secrets array. The secret value is used in the auth array of the scale rule.
KEDA scalers can use secrets in a TriggerAuthentication that is referenced by the authenticationRef property. You can map the TriggerAuthentication object to the Container Apps scale rule.
Find the TriggerAuthentication object referenced by the KEDA ScaledObject specification.
In the TriggerAuthentication object, find each secretTargetRef and its associated secret.
Some scalers support metadata with the FromEnv suffix to reference a value in an environment variable. Container Apps looks at the first container listed in the ARM template for the environment variable.
Container Apps scale rules can use managed identity to authenticate with Azure services. The following ARM template passes in system-based managed identity to authenticate for an Azure Queue scaler.
Container Apps scale rules support secrets-based authentication. Scale rules for Azure resources, including Azure Queue Storage, Azure Service Bus, and Azure Event Hubs, also support managed identity. Where possible, use managed identity authentication to avoid storing secrets within the app.
Use secrets
To configure secrets-based authentication for a Container Apps scale rule, you configure the secrets in the container app and reference them in the scale rule.
A KEDA scaler supports secrets in a TriggerAuthentication which the authenticationRef property uses for reference. You can map the TriggerAuthentication object to the Container Apps scale rule.
Find the TriggerAuthentication object referenced by the KEDA ScaledObject specification. Identify each secretTargetRef of the TriggerAuthentication object.
Container Apps scale rules can use managed identity to authenticate with Azure services. The following command creates a container app with a user-assigned managed identity and uses it to authenticate for an Azure Queue scaler.
In the portal, find the Metadata section and select Add. Enter the name and value for each item in the KEDA ScaledObject specification metadata section.
Authentication
Container Apps scale rules support secrets-based authentication. Scale rules for Azure resources, including Azure Queue Storage, Azure Service Bus, and Azure Event Hubs, also support managed identity. Where possible, use managed identity authentication to avoid storing secrets within the app.
Use secrets
In your container app, create the secrets that you want to reference.
Find the TriggerAuthentication object referenced by the KEDA ScaledObject specification. Identify each secretTargetRef of the TriggerAuthentication object.
In the Authentication section, select Add to create an entry for each KEDA secretTargetRef parameter.
Using managed identity
Managed identity authentication is not supported in the Azure portal. Use the Azure CLI or Azure Resource Manager to authenticate using managed identity.
Default scale rule
If you don't create a scale rule, the default scale rule is applied to your container app.
Trigger
Min replicas
Max replicas
HTTP
0
10
Important
Make sure you create a scale rule or set minReplicas to 1 or more if you don't enable ingress. If ingress is disabled and you don't define a minReplicas or a custom scale rule, then your container app will scale to zero and have no way of starting back up.
As your app scales out, KEDA starts with an empty queue and performs the following steps:
Check my-queue every 30 seconds.
If the queue length equals 0, go back to (1).
If the queue length is > 0, scale the app to 1.
If the queue length is 50, calculate desiredReplicas = ceil(50/5) = 10.
Scale app to min(maxReplicaCount, desiredReplicas, max(4, 2*currentReplicaCount))
Go back to (1).
If the app was scaled to the maximum replica count of 20, scaling goes through the same previous steps. Scale down only happens if the condition was satisfied for 300 seconds (scale down stabilization window). Once the queue length is 0, KEDA waits for 300 seconds (cool down period) before scaling the app to 0.
Considerations
In "multiple revisions" mode, adding a new scale trigger creates a new revision of your application but your old revision remains available with the old scale rules. Use the Revision management page to manage traffic allocations.
No usage charges are incurred when an application scales to zero. For more pricing information, see Billing in Azure Container Apps.
Replica quantities are a target amount, not a guarantee.
If you're using Dapr actors to manage states, you should keep in mind that scaling to zero isn't supported. Dapr uses virtual actors to manage asynchronous calls, which means their in-memory representation isn't tied to their identity or lifetime.
This module addresses the concept of revisions in Azure Container Apps and discusses options for application lifecycle management. It also covers scaling choices and ingress settings, including traffic splitting for Azure Container Apps.