Azure Container Instance takes very long to start. If it even starts

Question

Azure Container Instance takes very long to start. If it even starts

Duv, Samir 0

Hello,

I have an App Service API that takes a request, and starts a Container Group as a background job. The issue is that the Container Instance takes more than half an hour to start. Sometimes it does not seem to ever start. The docker image I use is rather large: ~2.7GB but when the instance finally starts, pulling the image seems to be rather quick. The region of the instance and the registry is East US. The hardware is 1 K80 Gpu and 1 6GB vCpu.

My questions are:

Is this expected behavior?
If it is, what would be an alternative for my use case. I need short bursts of GPU hardware to read a video from storage, analyze it and then store it back in memory
I cannot access any logs until shortly before the instance actually starts. I am using these commands https://learn.microsoft.com/en-us/azure/container-instances/container-instances-get-logs. Is there any other diagnostics available, to see what the issue might be?

vipullag-MSFT 26,487 Reputation points Moderator

2023-06-27T06:47:07.29+00:00

Hello Duv, Samir

Any update on the issue?

Just checking in to see if you got a chance to see previous response.

If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

1 answer

Your answer

vipullag-MSFT 26,487 Reputation points Moderator

2023-06-27T06:47:07.29+00:00

Hello Duv, Samir

Any update on the issue?

Just checking in to see if you got a chance to see previous response.

If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Answer 1

Hi Duv,

Thank you for your questions,

From what you described, this delay seems to be related to the availability of the underlay infrastructure as workloads that require GPU resources usually take longer to run,
As the K80 GPUs are being retired soon, this might be aggravating the issue.
https://learn.microsoft.com/en-us/azure/container-instances/container-instances-resource-and-quota-limits#gpu-resources-preview

If this is the issue, and your workloads are pending while infrastructure is being provisioned there are no other logs (besides the ones you mentions) that you can gather.
If you want to follow up on this, I encourage you to open a Support Request for a more detailed investigation on one of those deployments.

As an alternative, like the documentation mentions, you can use AKS to provision these jobs. I understand you only need them in short bursts, so consider having a nodepool with GPUs that can autoscale with the requests/necessity of your workloads, to optimize your infrastructure costs.

Share via

Azure Container Instance takes very long to start. If it even starts

1 answer

Your answer