Azure Container Apps (ACA) revision swap: SIGTERM not delivered immediately on deprovision

Question

Azure Container Apps (ACA) revision swap: SIGTERM not delivered immediately on deprovision

최 선호 25

Hello,

I’m running a Celery worker container deployed on Azure Container Apps (ACA).

When ACA replaces a revision (a new revision becomes active, then the old revision starts deprovisioning), I’m seeing unexpected shutdown signal behavior:

SIGTERM is not delivered immediately to the application when the old revision begins deprovisioning.
- Instead, SIGTERM is received ~30 seconds later.
Even if I handle SIGTERM in the application (to log it and keep the process alive / perform a graceful shutdown), the container still disappears in the Azure Portal.
I set terminationGracePeriodSeconds: 60, but:
- even after that time passes, I do not observe a SIGKILL being sent to the process, and
- the behavior does not match the expected “SIGTERM → wait for grace period → SIGKILL” sequence.

At the moment, the biggest issue is that SIGTERM is not triggered immediately when deprovisioning starts, which makes it difficult to gracefully stop Celery workers (e.g., stop consuming new messages and finish in-flight tasks).

Any guidance, documentation links, or recommended best practices for graceful shutdown of message consumers (Celery workers) on ACA revision swaps would be appreciated.

최 선호 25 Reputation points

2025-12-29T08:01:15.5566667+00:00

On the application side, Celery already handles SIGTERM properly for graceful (warm) shutdown, and we confirmed it works as expected in a typical Kubernetes environment.

However, in Azure Container Apps (ACA) Dedicated profile, when a revision starts being deprovisioned, the app stays in the Deprovisioning state for about 30 seconds, and SIGTERM is not delivered during this period.

That means no shutdown signal is observed by the app, so it continues to receive and process incoming tasks.

After those ~30 seconds, SIGTERM and container removal happen at the same time. If there are still running tasks, Celery attempts a warm shutdown at that point.

But by then, the ACA container is already gone when viewed from the Portal/CLI. The log stream may still connect, but it looks like the container’s network configuration has already been torn down—network communication no longer works.

As a result, even if we handle SIGTERM, the ACA container becomes unusable during this termination flow.

Could you please investigate/confirm whether this behavior is expected in ACA, and whether there is a supported way to ensure SIGTERM is delivered promptly (and traffic/tasks are drained) before the container/network is removed?
Siddhesh Desai 7,080 Reputation points Microsoft External Staff Moderator

2026-01-15T18:36:53.72+00:00
Hi @최 선호

Thank you for reaching out to Microsoft Q&A.

Apologies for the delayed response.

ACA does not implement Kubernetes style lifecycle hooks or PID 1 only signaling. Instead, it uses a “kill entire container” approach for simplicity and consistency across profiles.

When a revision is deactivated, ACA sends SIGTERM to all processes in the container to ensure the container stops promptly. This behavior is consistent whether child processes are forked, spawned, or placed in separate process groups.

Even if you use setsid or isolate process groups, ACA does not respect those boundaries it enumerates all PIDs in the container and signals them directly.

This design choice means frameworks like Celery prefork pool cannot rely on the master process to propagate signals because the children receive SIGTERM independently and exit before the master can coordinate a warm shutdown. Celery expects only the main process to receive SIGTERM and then gracefully stop workers. ACA breaks this assumption because child workers terminate immediately, causing WorkerLostError and incomplete task handling.

To avoid this, Try the steps outlined below:

Avoid Prefork Pool

Use --pool=solo or --pool=threads to keep everything in one process.

Configure Grace Period

Set terminationGracePeriodSeconds (up to 600s) in your ACA template:

"template": { "terminationGracePeriodSeconds": 60, "containers": [ { "image": "<your-image>", "name": "celery-worker" } ] }

Try to trap SIGTERM in Entrypoint

Implement a signal handler to stop accepting new tasks and exit cleanly.

Sample code in Python:

import signal, sys from celery import Celery app = Celery('tasks') def handle_sigterm(signum, frame): print("SIGTERM received, stopping consumers...") app.control.cancel_consumer() sys.exit(0) signal.signal(signal.SIGTERM, handle_sigterm)

Design for Ephemeral Compute

Persist task state externally (e.g., Azure Service Bus).

Make tasks idempotent.

Keep tasks short so abrupt termination has minimal impact.

4 answers

Your answer

최 선호 25 Reputation points

2025-12-29T08:01:15.5566667+00:00

On the application side, Celery already handles SIGTERM properly for graceful (warm) shutdown, and we confirmed it works as expected in a typical Kubernetes environment.

However, in Azure Container Apps (ACA) Dedicated profile, when a revision starts being deprovisioned, the app stays in the Deprovisioning state for about 30 seconds, and SIGTERM is not delivered during this period.

That means no shutdown signal is observed by the app, so it continues to receive and process incoming tasks.

After those ~30 seconds, SIGTERM and container removal happen at the same time. If there are still running tasks, Celery attempts a warm shutdown at that point.

But by then, the ACA container is already gone when viewed from the Portal/CLI. The log stream may still connect, but it looks like the container’s network configuration has already been torn down—network communication no longer works.

As a result, even if we handle SIGTERM, the ACA container becomes unusable during this termination flow.

Could you please investigate/confirm whether this behavior is expected in ACA, and whether there is a supported way to ensure SIGTERM is delivered promptly (and traffic/tasks are drained) before the container/network is removed?

Answer 1

최 선호 25

Hello,

I’m posting an update with additional findings.

First, I’d like to correct my earlier statement about SIGTERM. I initially thought SIGTERM was not being delivered, but the real reason it looked that way was that Celery did not perform a warm shutdown.

We are running Celery with the prefork pool. In our container, there is one main process (PID 1) and two child worker processes.

After further investigation, we confirmed that SIGTERM should ideally be delivered only to the main process (PID 1), but in our case the platform sends SIGTERM to all child processes as well. Also, this SIGTERM was not propagated from the main process—it appears to be delivered directly by the platform to each child process.

Because the child processes are terminated by SIGTERM, we believe the main Celery process cannot complete a proper warm shutdown and errors occur (e.g., worker lost errors).

Could you help explain why SIGTERM is delivered to all processes (not just PID 1) in this environment?

For reference, this is the Korea Central region and Consumption profile

Thank you.

최 선호 25 Reputation points

2026-01-08T07:57:06.3033333+00:00
We reproduced the behavior using a minimal Python app (non-Celery). On revision deactivation/termination, ACA delivers SIGTERM not only to PID 1 but also to child processes.

This happens regardless of how child processes are created (fork or spawn/exec), and even when children are placed into separate sessions/process groups (setsid), children still receive SIGTERM.

Therefore, the issue is not specific to Celery or identical CMD lines; it appears that ACA sends SIGTERM to all processes inside the container, which breaks Celery prefork warm shutdown and leads to WorkerLostError.

Answer 2

On the application side, Celery already handles SIGTERM properly for graceful (warm) shutdown, and we confirmed it works as expected in a typical Kubernetes environment.

However, in Azure Container Apps (ACA) Dedicated profile, when a revision starts being deprovisioned, the app stays in the Deprovisioning state for about 30 seconds, and SIGTERM is not delivered during this period.

That means no shutdown signal is observed by the app, so it continues to receive and process incoming tasks.

After those ~30 seconds, SIGTERM and container removal happen at the same time. If there are still running tasks, Celery attempts a warm shutdown at that point.

But by then, the ACA container is already gone when viewed from the Portal/CLI. The log stream may still connect, but it looks like the container’s network configuration has already been torn down—network communication no longer works.

As a result, even if we handle SIGTERM, the ACA container becomes unusable during this termination flow.

Could you please investigate/confirm whether this behavior is expected in ACA, and whether there is a supported way to ensure SIGTERM is delivered promptly (and traffic/tasks are drained) before the container/network is removed?

Answer 3

Hi @최 선호

Thank you for reaching out to Microsoft Q&A.

Apologies for the delayed response.

ACA does not implement Kubernetes style lifecycle hooks or PID 1 only signaling. Instead, it uses a “kill entire container” approach for simplicity and consistency across profiles.

When a revision is deactivated, ACA sends SIGTERM to all processes in the container to ensure the container stops promptly. This behavior is consistent whether child processes are forked, spawned, or placed in separate process groups.

Even if you use setsid or isolate process groups, ACA does not respect those boundaries it enumerates all PIDs in the container and signals them directly.

This design choice means frameworks like Celery prefork pool cannot rely on the master process to propagate signals because the children receive SIGTERM independently and exit before the master can coordinate a warm shutdown. Celery expects only the main process to receive SIGTERM and then gracefully stop workers. ACA breaks this assumption because child workers terminate immediately, causing WorkerLostError and incomplete task handling.

To avoid this, Try the steps outlined below:

Avoid Prefork Pool

Use --pool=solo or --pool=threads to keep everything in one process.

Configure Grace Period

Set terminationGracePeriodSeconds (up to 600s) in your ACA template:

JSON

"template": {
  "terminationGracePeriodSeconds": 60,
  "containers": [
    {
      "image": "<your-image>",
      "name": "celery-worker"
    }
  ]
}

Try to trap SIGTERM in Entrypoint

Implement a signal handler to stop accepting new tasks and exit cleanly.

Sample code in Python:

Python

import signal, sys
from celery import Celery
app = Celery('tasks')
def handle_sigterm(signum, frame):
    print("SIGTERM received, stopping consumers...")
    app.control.cancel_consumer()
    sys.exit(0)
signal.signal(signal.SIGTERM, handle_sigterm)

Design for Ephemeral Compute

Persist task state externally (e.g., Azure Service Bus).

Make tasks idempotent.

Keep tasks short so abrupt termination has minimal impact.

Siddhesh Desai 7,080 Reputation points Microsoft External Staff Moderator

2026-01-15T18:42:48.7033333+00:00

Hi @최 선호

Can you refer my updated answer above? and see if it helps?
최 선호 25 Reputation points

2026-01-16T05:02:28.5766667+00:00

Hi,

Thank you for your response.

In the Consumption profile, as you described, we observed that SIGTERM is delivered to all processes/containers.

However, in the Dedicated profile, we confirmed that SIGTERM is delivered only to the PID 1 process (and not to child processes).

Does this match the intended behavior of the Azure Container Apps runtime? In other words, is the difference in SIGTERM delivery between Consumption vs. Dedicated consistent with how ACA is designed to operate?

Answer 4

In Azure Container Apps (ACA), when a revision is deprovisioned, the expected behavior is that the container receives a SIGTERM signal to allow for graceful shutdown. However, it seems you are experiencing a delay in receiving the SIGTERM signal, which is impacting your ability to gracefully stop your Celery workers.

Here are some points to consider based on the context:

SIGTERM Handling: The application should be able to handle the SIGTERM signal to perform any necessary cleanup. However, if the signal is not received immediately, it can hinder the shutdown process.
Grace Period: You mentioned setting terminationGracePeriodSeconds: 60. This setting is intended to give your application time to shut down gracefully after receiving SIGTERM. If SIGKILL is not observed after this period, it may indicate that the shutdown process is not being handled as expected.
Container Lifecycle: Containers in ACA can restart regularly, and it’s important to ensure that your application can handle these lifecycle events. You should not expect state to persist inside a container, so using external storage for in-flight tasks can help manage state better during shutdowns.
Logging and Monitoring: Implement logging within your signal handling code to track when SIGTERM is received and how long it takes for your application to respond. This can provide insights into the shutdown process and help identify any delays.
Documentation and Best Practices: For further guidance, you can refer to the Azure documentation on application lifecycle management and troubleshooting container start failures. These resources can provide additional insights into managing your application’s lifecycle effectively.

If the behavior persists, consider reaching out to Azure support for more tailored assistance regarding the specific timing of SIGTERM signals during revision swaps.

References:

Share via

Azure Container Apps (ACA) revision swap: SIGTERM not delivered immediately on deprovision

4 answers

Your answer