Target-based scaling

Target-based scaling provides a fast and intuitive scaling model for customers and is currently supported for the following extensions:

Target-based scaling replaces the previous Azure Functions incremental scaling model as the default for these extension types. Incremental scaling added or removed a maximum of one worker at each new instance rate, with complex decisions for when to scale. In contrast, target-based scaling allows scale up of four instances at a time, and the scaling decision is based on a simple target-based equation:

Illustration of the equation: desired instances = event source length / target executions per instance.

The default target executions per instance values come from the SDKs used by the Azure Functions extensions. You don't need to make any changes for target-based scaling to work.

Considerations

The following considerations apply when using target-based scaling:

  • Target-based scaling is enabled by default for function apps on the Consumption plan or for Premium plans, but you can opt-out. Event-driven scaling isn't supported when running on Dedicated (App Service) plans.
  • Target Based Scaling is enabled by default on function app runtime 4.19.0 or a later version.
  • When using target-based scaling, the functionAppScaleLimit site setting is still honored. For more information, see Limit scale out.
  • To achieve the most accurate scaling based on metrics, use only one target-based triggered function per function app.
  • When multiple functions in the same function app are all requesting to scale out at the same time, a sum across those functions is used to determine the change in desired instances. Functions requesting to scale out override functions requesting to scale in.
  • When there are scale-in requests without any scale-out requests, the max scale in value is used.

Opting out

Target-based scaling is enabled by default for function apps hosted on a Consumption plan or on a Premium plans. To disable target-based scaling and fall back to incremental scaling, add the following app setting to your function app:

App Setting Value
TARGET_BASED_SCALING_ENABLED 0

Customizing target-based scaling

You can make the scaling behavior more or less aggressive based on your app's workload by adjusting target executions per instance. Each extension has different settings that you can use to set target executions per instance.

This table summarizes the host.json values that are used for the target executions per instance values and the defaults:

Extension host.json values Default Value
Event Hubs (Extension v5.x+) extensions.eventHubs.maxEventBatchSize 100*
Event Hubs (Extension v3.x+) extensions.eventHubs.eventProcessorOptions.maxBatchSize 10
Event Hubs (if defined) extensions.eventHubs.targetUnprocessedEventThreshold n/a
Service Bus (Extension v5.x+, Single Dispatch) extensions.serviceBus.maxConcurrentCalls 16
Service Bus (Extension v5.x+, Single Dispatch Sessions Based) extensions.serviceBus.maxConcurrentSessions 8
Service Bus (Extension v5.x+, Batch Processing) extensions.serviceBus.maxMessageBatchSize 1000
Service Bus (Functions v2.x+, Single Dispatch) extensions.serviceBus.messageHandlerOptions.maxConcurrentCalls 16
Service Bus (Functions v2.x+, Single Dispatch Sessions Based) extensions.serviceBus.sessionHandlerOptions.maxConcurrentSessions 2000
Service Bus (Functions v2.x+, Batch Processing) extensions.serviceBus.batchOptions.maxMessageCount 1000
Storage Queue extensions.queues.batchSize 16

* The default maxEventBatchSize changed in v6.0.0 of the Microsoft.Azure.WebJobs.Extensions.EventHubs package. In earlier versions, this value was 10.

For some binding extensions, target executions per instance is set using a function attribute:

Extension Function trigger setting Default Value
Apache Kafka lagThreshold 1000
Azure Cosmos DB maxItemsPerInvocation 100

To learn more, see the example configurations for the supported extensions.

Premium plan with runtime scale monitoring enabled

When runtime scale monitoring is enabled, the extensions themselves handle dynamic scaling. This is because the scale controller doesn't have access to services secured by a virtual network. After you enable runtime scale monitoring, you'll need to upgrade your extension packages to these minimum versions to unlock the extra target-based scaling functionality:

Extension Name Minimum Version Needed
Apache Kafka 3.9.0
Azure Cosmos DB 4.1.0
Event Hubs 5.2.0
Service Bus 5.9.0
Storage Queue 5.1.0

Dynamic concurrency support

Target-based scaling introduces faster scaling, and uses defaults for target executions per instance. When using Service Bus, Storage queues, or Kafka, you can also enable dynamic concurrency. In this configuration, the target executions per instance value is determined automatically by the dynamic concurrency feature. It starts with limited concurrency and identifies the best setting over time.

Supported extensions

The way in which you configure target-based scaling in your host.json file depends on the specific extension type. This section provides the configuration details for the extensions that currently support target-based scaling.

Service Bus queues and topics

The Service Bus extension support three execution models, determined by the IsBatched and IsSessionsEnabled attributes of your Service Bus trigger. The default value for IsBatched and IsSessionsEnabled is false.

Execution Model IsBatched IsSessionsEnabled Setting Used for target executions per instance
Single dispatch processing false false maxConcurrentCalls
Single dispatch processing (session-based) false true maxConcurrentSessions
Batch processing true false maxMessageBatchSize or maxMessageCount

Note

Scale efficiency: For the Service Bus extension, use Manage rights on resources for the most efficient scaling. With Listen rights scaling reverts to incremental scale because the queue or topic length can't be used to inform scaling decisions. To learn more about setting rights in Service Bus access policies, see Shared Access Authorization Policy.

Single dispatch processing

In this model, each invocation of your function processes a single message. The maxConcurrentCalls setting governs target executions per instance. The specific setting depends on the version of the Service Bus extension.

Modify the host.json setting maxConcurrentCalls, as in the following example:

{
    "version": "2.0",
    "extensions": {
        "serviceBus": {
            "maxConcurrentCalls": 16
        }
    }
}

Single dispatch processing (session-based)

In this model, each invocation of your function processes a single message. However, depending on the number of active sessions for your Service Bus topic or queue, each instance leases one or more sessions. The specific setting depends on the version of the Service Bus extension.

Modify the host.json setting maxConcurrentSessions to set target executions per instance, as in the following example:

{
    "version": "2.0",
    "extensions": {
        "serviceBus": {
            "maxConcurrentSessions": 8
        }
    }
}

Batch processing

In this model, each invocation of your function processes a batch of messages. The specific setting depends on the version of the Service Bus extension.

Modify the host.json setting maxMessageBatchSize to set target executions per instance, as in the following example:

{
    "version": "2.0",
    "extensions": {
        "serviceBus": {
            "maxMessageBatchSize": 1000
        }
    }
}

Event Hubs

For Azure Event Hubs, Azure Functions scales based on the number of unprocessed events distributed across all the partitions in the event hub. By default, the host.json attributes used for target executions per instance are maxEventBatchSize and maxBatchSize. However, if you choose to fine-tune target-based scaling, you can define a separate parameter targetUnprocessedEventThreshold that overrides to set target executions per instance without changing the batch settings. If targetUnprocessedEventThreshold is set, the total unprocessed event count is divided by this value to determine the number of instances, which is then be rounded up to a worker instance count that creates a balanced partition distribution.

Note

Since Event Hubs is a partitioned workload, the target instance count for Event Hubs is capped by the number of partitions in your event hub.

The specific setting depends on the version of the Event Hubs extension.

Modify the host.json setting maxEventBatchSize to set target executions per instance, as in the following example:

{
    "version": "2.0",
    "extensions": {
        "eventHubs": {
            "maxEventBatchSize" : 100
        }
    }
}

When defined in host.json, targetUnprocessedEventThreshold is used as target executions per instance instead of maxEventBatchSize, as in the following example:

{
    "version": "2.0",
    "extensions": {
        "eventHubs": {
            "targetUnprocessedEventThreshold": 153
        }
    }
}

Storage Queues

For v2.x+ of the Storage extension, modify the host.json setting batchSize to set target executions per instance:

{
    "version": "2.0",
    "extensions": {
        "queues": {
            "batchSize": 16
        }
    }
}

Note

Scale efficiency: For the storage queue extension, messages with visibilityTimeout are still counted in event source length by the Storage Queue APIs. This can cause overscaling of your function app. Consider using Service Bus queues que scheduled messages, limiting scale out, or not using visibilityTimeout for your solution.

Azure Cosmos DB

Azure Cosmos DB uses a function-level attribute, MaxItemsPerInvocation. The way you set this function-level attribute depends on your function language.

For a compiled C# function, set MaxItemsPerInvocation in your trigger definition, as shown in the following examples for an in-process C# function:

namespace CosmosDBSamplesV2
{
    public static class CosmosTrigger
    {
        [FunctionName("CosmosTrigger")]
        public static void Run([CosmosDBTrigger(
            databaseName: "ToDoItems",
            collectionName: "Items",
            MaxItemsPerInvocation: 100,
            ConnectionStringSetting = "CosmosDBConnection",
            LeaseCollectionName = "leases",
            CreateLeaseCollectionIfNotExists = true)]IReadOnlyList<Document> documents,
            ILogger log)
        {
            if (documents != null && documents.Count > 0)
            {
                log.LogInformation($"Documents modified: {documents.Count}");
                log.LogInformation($"First document Id: {documents[0].Id}");
            }
        }
    }
}

Note

Since Azure Cosmos DB is a partitioned workload, the target instance count for the database is capped by the number of physical partitions in your container. To learn more about Azure Cosmos DB scaling, see physical partitions and lease ownership.

Apache Kafka

The Apache Kafka extension uses a function-level attribute, LagThreshold. For Kafka, the number of desired instances is calculated based on the total consumer lag divided by the LagThreshold setting. For a given lag, reducing the lag threshold increases the number of desired instances.

The way you set this function-level attribute depends on your function language. This example sets the threshold to 100.

For a compiled C# function, set LagThreshold in your trigger definition, as shown in the following examples for an in-process C# function for a Kafka Event Hubs trigger:

[FunctionName("KafkaTrigger")]
public static void Run(
    [KafkaTrigger("BrokerList",
                  "topic",
                  Username = "$ConnectionString",
                  Password = "%EventHubConnectionString%",
                  Protocol = BrokerProtocol.SaslSsl,
                  AuthenticationMode = BrokerAuthenticationMode.Plain,
                  ConsumerGroup = "$Default",
                  LagThreshold = 100)] KafkaEventData<string> kevent, ILogger log)
{            
    log.LogInformation($"C# Kafka trigger function processed a message: {kevent.Value}");
}

Next steps

To learn more, see the following articles: