Azure Container Instances - Over two hours before starting

Question

Azure Container Instances - Over two hours before starting

Casper Lindberg 21

Hi,

Lately, our Azure Container Instances takes very long time to start. They are triggered via Airflow using the

AzureContainerInstancesOperator. Sometimes, it takes more than 2 hours until the container instance even starts. Other times it starts but is pending - unhealthy - repairing. In general, I expect container instances to take less than 5 minutes to start but we have had at least 50 occurrences in the last two days where it takes more than 20 minutes which is not acceptable.

What can be done to mitigate this issue?
Why does it take so long time to start?
Are there any workarounds?

Accepted answer

0 additional answers

Your answer

Answer 1

Andrei Barbu 2,596 Microsoft Employee

Hello Casper Lindberg

As per https://learn.microsoft.com/en-us/azure/container-instances/container-state#create-start-and-restart-operations:

Unhealthy: The container group is unhealthy. For an unexpected state, such as if a node is down, a job is automatically triggered to repair the container group by moving it.

Repairing: The container group is getting moved in order to repair an unhealthy state.

If you are sure the application itself doesn't have any issue, a theory can be the fact that the containers landed on a node with issues. As a workaround, you may want to try deploying in a different region.

As per https://learn.microsoft.com/en-us/azure/container-instances/container-instances-region-availability:
"Container groups created within these resource limits are subject to availability within the deployment region. When a region is under heavy load, you may experience a failure when deploying instances. To mitigate such a deployment failure, try deploying instances with lower resource settings, or try your deployment at a later time or in a different region with available resources."

Could you please let me know what region you are using and if you get an error, or it just tries to deploy and it gets deploy without no error, but after long time?

Casper Lindberg 21 Reputation points

2023-05-05T08:40:09.9733333+00:00

Thank you for the answer!

This has worked well for ~1 year (and always works well during the night) so there is nothing wrong with the application. We are in West Europe and cannot change region easily due to networking limitation. Our container instances require 14 GB of RAM and 1 CPU so we will try to push some of them to require less RAM.

The most common error we get is that the container group is started and then get stuck at Waiting or Pending or does not give any status update at all.

Is it possible to pay more to get priority for the container resources or something similar?
Andrei Barbu 2,596 Reputation points Microsoft Employee

2023-05-05T08:55:32.53+00:00

My pleasure, Casper! You may consider using Dedicated Hosts. Their goal is more related to security than ensuring resource availability, however, I'll try to clarify if dedicates hosts can help in this scenario as well. At this moment, there is no option to pay more to get priority, but I'll keep researching to find out if there is any alternative.

I would suggest you to open a support request to get this specific issue investigated and addressed.

Please "Accept the answer" if the information helped you. This will help us and others in the community as well. Feel free to reply with any other questions or concerns.

Hope this helps!
Casper Lindberg 21 Reputation points

2023-05-05T09:07:48.1666667+00:00

Since we spin up Azure Container Instances on the go in our code, I don't think using a Dedicated Host is helpful but thank you for informing about it. I will open a support request about it and mark you answer as accepted in the meantime. However, is it the case that West Europe is running at capacity when it comes to Azure Container Instances from time to time?
Andrei Barbu 2,596 Reputation points Microsoft Employee

2023-05-05T13:43:02.72+00:00

Thank you, Casper!

As per the link I shared above, running at capacity can happen for any instance. There were a few issues on West Europe side that should've been resolved yesterday; if you are facing the issue today, the support request should be the proper option to get your specific issue investigated.

Share via

Azure Container Instances - Over two hours before starting

0 additional answers

Your answer