Jobs Getting Suspended in Azure Container Apps (KEDA Queue-based Triggered)

Arthur Kerst 5 Reputation points
2025-03-04T12:51:51.4533333+00:00

Some of our Azure Container App Jobs are being suspended unexpectedly. The logs show "Suspending Scale Job: jobname". We expect that this issue is related to the scaling of the job executions. The jobs are triggered via KEDA based on messages in a Service Bus Queue. However, some jobs are suspended/stoppped despite the queue having messages, and no other job executions are started.

Additionally, the following messages appear intermittently:

0/3 nodes are available: 1 node(s) had untolerated taint {virtual-kubelet.io/provider: legion}, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

We need help understanding why jobs are suspended and why these node-related log messages appear.

Relevant Configuration Details:

The workload profile is specified as:

resource containerAppJob 'Microsoft.App/jobs@2024-08-02-preview' = {
  name: jobname
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    environmentId: managedEnvironment.id
    workloadProfileName: 'pt-D8-8-32'
    template: {
      containers: [
        {
          name: 'containername'
          image: 'name.azurecr.io/image:latest'
          imageType: 'ContainerImage'
          command: [
            'python'
          ]
          args: [
            '-m'
            'src.core_processing.process'
          ]
          resources: {
            cpu: json('8')
            memory: '32Gi'
          }
        }
      ]
    }
    configuration: {
      registries: [
        {
          server: 'name.azurecr.io'
          identity: 'system'
        }
      ]
      triggerType: 'Event'
      replicaTimeout: 3600
      replicaRetryLimit: 0
      eventTriggerConfig: {
        replicaCompletionCount: 1
        parallelism: 1
        scale: {
          minExecutions: 0
          maxExecutions: 2
          pollingInterval: 5
          rules: [
            {
              name: 'message-start-job'
              type: 'azure-servicebus'
              metadata: {
                activationMessageCount: '0'
                messageCount: '1'
                namespace: serviceBusNamespaceName
                queueName: serviceBusQueueProcess
              } 
              auth: []
              identity: 'system'
            }
          ]
        }
      }
    } 
  }
}
      {
        workloadProfileType: 'D8'
        name: 'pt-D8-8-32'
        enableFips: false
        minimumCount: 0
        maximumCount: 4
      }

What I Need Help With:

  • Why are jobs getting suspended despite messages in the queue?
  • Are the node-related errors causing job suspensions?
  • Is there any configuration adjustment needed to prevent suspensions and improve scaling?
Azure Container Apps
Azure Container Apps
An Azure service that provides a general-purpose, serverless container platform.
700 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Khadeer Ali 5,990 Reputation points Microsoft External Staff Moderator
    2025-03-04T15:25:46.98+00:00

    @Arthur Kerst ,

    Thanks for reaching out. Jobs in Azure Container Apps can get suspended for various reasons, especially when using KEDA for scaling based on Service Bus Queue messages. Here are some possible reasons and related log messages:

    1. Insufficient Resources: If there aren't enough resources (e.g., CPU, memory), KEDA might suspend the jobs. The message 0/3 means there are no available nodes, either because they're at capacity or none are available.
    2. Scaling Configuration: If your scaling settings aren't optimal, KEDA might scale down to 0 if it can't process messages or allocate resources.
    3. Job Execution Limits: If jobs aren't set to handle multiple messages at once, they might not start new executions even if there are messages in the queue.
    4. Node Pool Configuration: If your Kubernetes node pool lacks enough nodes, you might need to increase the number or adjust autoscaling settings to ensure enough resources.

    To avoid suspensions and improve scaling, consider these tweaks:

    • Increase the minimum count of job replicas to always have instances ready to process messages.
    • Check the resource limits and requests for your jobs to ensure they're suitable for the workload.
    • Monitor your Kubernetes nodes' health and capacity to ensure they can handle the load.

  2. Arslan Amanat 0 Reputation points
    2025-04-14T19:54:26.2166667+00:00

    Hey i am having the same issue using Github Action with Azure CLI . I have observed when i create Azure Containerapp Job via the GUI (portal) it create the connection successfully listening to the Azure Queue . Whenever there is a message in the queue it starts the container job execution. But when i update the container with a new images using github actions somehow the queueLength=1 stops working and now it starts the execution of container after there are 5 messages in the queue .

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.