Azure Container Instances - Over two hours before starting

Casper Lindberg 21 Reputation points
2023-05-05T06:28:07.9+00:00

Hi,

Lately, our Azure Container Instances takes very long time to start. They are triggered via Airflow using the

AzureContainerInstancesOperator. Sometimes, it takes more than 2 hours until the container instance even starts. Other times it starts but is pending - unhealthy - repairing. In general, I expect container instances to take less than 5 minutes to start but we have had at least 50 occurrences in the last two days where it takes more than 20 minutes which is not acceptable.

  • What can be done to mitigate this issue?
  • Why does it take so long time to start?
  • Are there any workarounds?
Azure Container Instances
Azure Container Instances
An Azure service that provides customers with a serverless container experience.
633 questions
0 comments No comments
{count} votes

Accepted answer
  1. Andrei Barbu 2,576 Reputation points Microsoft Employee
    2023-05-05T07:27:42.7566667+00:00

    Hello Casper Lindberg

    As per https://learn.microsoft.com/en-us/azure/container-instances/container-state#create-start-and-restart-operations:

    Unhealthy: The container group is unhealthy. For an unexpected state, such as if a node is down, a job is automatically triggered to repair the container group by moving it.

    • Repairing: The container group is getting moved in order to repair an unhealthy state.

    If you are sure the application itself doesn't have any issue, a theory can be the fact that the containers landed on a node with issues. As a workaround, you may want to try deploying in a different region.

    As per https://learn.microsoft.com/en-us/azure/container-instances/container-instances-region-availability:
    "Container groups created within these resource limits are subject to availability within the deployment region. When a region is under heavy load, you may experience a failure when deploying instances. To mitigate such a deployment failure, try deploying instances with lower resource settings, or try your deployment at a later time or in a different region with available resources."

    Could you please let me know what region you are using and if you get an error, or it just tries to deploy and it gets deploy without no error, but after long time?


0 additional answers

Sort by: Most helpful