Batch node in unusable state without errors

Michael Mehrtens (Credera) 6 Reputation points
2021-09-13T17:25:39+00:00

When creating a batch pool, a subset of the nodes will become unusable without any errors. Sometimes batch will reschedule these nodes and successfully start them, other times they remain unusable. I'm not sure what could be going wrong here, we have a pretty simple setup:

vm_sku=batch.node.ubuntu 20.04
vm_image=microsoft-azure-batch:ubuntu-server-container:20-04-lts
vm_size=Standard_D2_v3
dedicated node count=1
low priority node count=79
task_slots_per_node=7
node start task=None

We're using a custom docker image to deploy our code which works well and hasn't caused node startup issues before. Similar posts have been made about unusable nodes, but these are generally due to application package issues & VM image issues which aren't at play here.

I'm not sure where to begin troubleshooting here, any help or suggestions would be appreciated!

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
310 questions
{count} votes