Share via

Azure batch Compute node getting stuck in starting state for long

AmishG 1 Reputation point
2020-11-27T04:59:28.65+00:00

Hello,

We have been using Azure batch since last 8 to 9 months and have been using low priority nodes since we are in development and testing phase of our project.

We are noticing two problems:

1) During the last couple of weeks starting from 15th Nov'20, we are noticing lot of unstability in provisioning the low priority nodes. Earlier this issue was not observed so frequently. Presently at most allocating max 10 nodes only

2) The resizing of pool especially from 4 to 5 takes lot of time and at time timeout

Our pool information:

1) Region - central india

2) VM size - standard_a2

3) Operating system: Shared Image Gallery (with 5 replicas)

4) sku - 2016-datacenter-smalldisk

5) offer - WindowsServer

6) version - latest

We also tried collecting any logs for the nodes stuck in starting state, however azure provided an error and could not dump any logs for such nodes. Let me know if there are other ways to get the logs in-order to better understand the issue.

We have implemented our own auto-scaling algorithm and also thinking of implementing of removal of nodes if they are stuck in one state (say: starting) for quiet long then the normal boot-up time of the machine. Unfortunately, since even starting up of the nodes are being charged, if more and more starts getting into the stuck state would request in unnecessary cost implication. We need to sort this out to convince our management to utilize Azure batch in production for our customer where such unstability and cost implication would be very critical.

Kindly help us with all the details that can be provided to prevent nodes getting stuck in a single state for long.

Thanks,

Amish

Azure Batch
Azure Batch

An Azure service that provides cloud-scale job scheduling and compute management.


2 answers

Sort by: Most helpful
  1. Neharika Singh 16 Reputation points
    2022-07-19T23:24:57.123+00:00

    I have similar issue.. Machine is stuck in starting phase for long

    Was this answer helpful?

    0 comments No comments

  2. AmishG 1 Reputation point
    2020-12-17T07:25:14.857+00:00

    @prmanhas-MSFT - I have already engaged with Azure support and created ticket 2012040060002634.

    Thanks for your enquiry.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.