Share via

Azure Container Apps (ACA) revision swap: SIGTERM not delivered immediately on deprovision

최 선호 25 Reputation points
2025-12-22T08:14:32.0233333+00:00

Hello,

I’m running a Celery worker container deployed on Azure Container Apps (ACA).

When ACA replaces a revision (a new revision becomes active, then the old revision starts deprovisioning), I’m seeing unexpected shutdown signal behavior:

  1. SIGTERM is not delivered immediately to the application when the old revision begins deprovisioning.
    • Instead, SIGTERM is received ~30 seconds later.
  2. Even if I handle SIGTERM in the application (to log it and keep the process alive / perform a graceful shutdown), the container still disappears in the Azure Portal.
  3. I set terminationGracePeriodSeconds: 60, but:
    • even after that time passes, I do not observe a SIGKILL being sent to the process, and
    • the behavior does not match the expected “SIGTERM → wait for grace period → SIGKILL” sequence.

At the moment, the biggest issue is that SIGTERM is not triggered immediately when deprovisioning starts, which makes it difficult to gracefully stop Celery workers (e.g., stop consuming new messages and finish in-flight tasks).

Any guidance, documentation links, or recommended best practices for graceful shutdown of message consumers (Celery workers) on ACA revision swaps would be appreciated.

Azure Container Apps
Azure Container Apps

An Azure service that provides a general-purpose, serverless container platform.


4 answers

Sort by: Most helpful
  1. 최 선호 25 Reputation points
    2026-01-08T07:01:33+00:00

    Hello,

    I’m posting an update with additional findings.

    First, I’d like to correct my earlier statement about SIGTERM. I initially thought SIGTERM was not being delivered, but the real reason it looked that way was that Celery did not perform a warm shutdown.

    We are running Celery with the prefork pool. In our container, there is one main process (PID 1) and two child worker processes.

    After further investigation, we confirmed that SIGTERM should ideally be delivered only to the main process (PID 1), but in our case the platform sends SIGTERM to all child processes as well. Also, this SIGTERM was not propagated from the main process—it appears to be delivered directly by the platform to each child process.

    Because the child processes are terminated by SIGTERM, we believe the main Celery process cannot complete a proper warm shutdown and errors occur (e.g., worker lost errors).

    Could you help explain why SIGTERM is delivered to all processes (not just PID 1) in this environment?

    For reference, this is the Korea Central region and Consumption profile

    Thank you.

    Was this answer helpful?


  2. 최 선호 25 Reputation points
    2025-12-31T01:44:07.4866667+00:00

    On the application side, Celery already handles SIGTERM properly for graceful (warm) shutdown, and we confirmed it works as expected in a typical Kubernetes environment.


    However, in Azure Container Apps (ACA) Dedicated profile, when a revision starts being deprovisioned, the app stays in the Deprovisioning state for about 30 seconds, and SIGTERM is not delivered during this period.

    That means no shutdown signal is observed by the app, so it continues to receive and process incoming tasks.

    After those ~30 seconds, SIGTERM and container removal happen at the same time. If there are still running tasks, Celery attempts a warm shutdown at that point.

    But by then, the ACA container is already gone when viewed from the Portal/CLI. The log stream may still connect, but it looks like the container’s network configuration has already been torn down—network communication no longer works.

    As a result, even if we handle SIGTERM, the ACA container becomes unusable during this termination flow.

    Could you please investigate/confirm whether this behavior is expected in ACA, and whether there is a supported way to ensure SIGTERM is delivered promptly (and traffic/tasks are drained) before the container/network is removed?

    Was this answer helpful?

    0 comments No comments

  3. Siddhesh Desai 7,080 Reputation points Microsoft External Staff Moderator
    2025-12-22T15:02:37.15+00:00

    Hi @최 선호

    Thank you for reaching out to Microsoft Q&A.

    Apologies for the delayed response.

    ACA does not implement Kubernetes style lifecycle hooks or PID 1 only signaling. Instead, it uses a “kill entire container” approach for simplicity and consistency across profiles.

    When a revision is deactivated, ACA sends SIGTERM to all processes in the container to ensure the container stops promptly. This behavior is consistent whether child processes are forked, spawned, or placed in separate process groups.

    Even if you use setsid or isolate process groups, ACA does not respect those boundaries it enumerates all PIDs in the container and signals them directly.

    This design choice means frameworks like Celery prefork pool cannot rely on the master process to propagate signals because the children receive SIGTERM independently and exit before the master can coordinate a warm shutdown. Celery expects only the main process to receive SIGTERM and then gracefully stop workers. ACA breaks this assumption because child workers terminate immediately, causing WorkerLostError and incomplete task handling.

    To avoid this, Try the steps outlined below:

    Avoid Prefork Pool

    Use --pool=solo or --pool=threads to keep everything in one process.

    Configure Grace Period

    Set terminationGracePeriodSeconds (up to 600s) in your ACA template:

    JSON

    "template": {
      "terminationGracePeriodSeconds": 60,
      "containers": [
        {
          "image": "<your-image>",
          "name": "celery-worker"
        }
      ]
    }
    

    Try to trap SIGTERM in Entrypoint

    Implement a signal handler to stop accepting new tasks and exit cleanly.

    Sample code in Python:

    Python

    import signal, sys
    from celery import Celery
    app = Celery('tasks')
    def handle_sigterm(signum, frame):
        print("SIGTERM received, stopping consumers...")
        app.control.cancel_consumer()
        sys.exit(0)
    signal.signal(signal.SIGTERM, handle_sigterm)
    

    Design for Ephemeral Compute

    Persist task state externally (e.g., Azure Service Bus).

    Make tasks idempotent.

    Keep tasks short so abrupt termination has minimal impact.

    Was this answer helpful?


  4. AI answer

    2025-12-22T08:14:42.95+00:00

    In Azure Container Apps (ACA), when a revision is deprovisioned, the expected behavior is that the container receives a SIGTERM signal to allow for graceful shutdown. However, it seems you are experiencing a delay in receiving the SIGTERM signal, which is impacting your ability to gracefully stop your Celery workers.

    Here are some points to consider based on the context:

    1. SIGTERM Handling: The application should be able to handle the SIGTERM signal to perform any necessary cleanup. However, if the signal is not received immediately, it can hinder the shutdown process.
    2. Grace Period: You mentioned setting terminationGracePeriodSeconds: 60. This setting is intended to give your application time to shut down gracefully after receiving SIGTERM. If SIGKILL is not observed after this period, it may indicate that the shutdown process is not being handled as expected.
    3. Container Lifecycle: Containers in ACA can restart regularly, and it’s important to ensure that your application can handle these lifecycle events. You should not expect state to persist inside a container, so using external storage for in-flight tasks can help manage state better during shutdowns.
    4. Logging and Monitoring: Implement logging within your signal handling code to track when SIGTERM is received and how long it takes for your application to respond. This can provide insights into the shutdown process and help identify any delays.
    5. Documentation and Best Practices: For further guidance, you can refer to the Azure documentation on application lifecycle management and troubleshooting container start failures. These resources can provide additional insights into managing your application’s lifecycle effectively.

    If the behavior persists, consider reaching out to Azure support for more tailored assistance regarding the specific timing of SIGTERM signals during revision swaps.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.