Share via

Cycle cloud - cloud init package failure

Mydeenbabu Abdul Kaboor 20 Reputation points Microsoft External Staff
2026-03-02T22:48:13.9066667+00:00

One of the cycle cloud cluster compute node packages are not working as expected. Either they are missing or crashed.

Azure CycleCloud
Azure CycleCloud

A Microsoft tool for creating, managing, operating, and optimizing high-performance computing (HPC) and big compute clusters in Azure.

0 comments No comments
{count} votes

Answer accepted by question author
  1. Jilakara Hemalatha 9,965 Reputation points Microsoft External Staff Moderator
    2026-03-03T00:08:03.2+00:00

    Thank you for your patience. As discussed offline, we reviewed this behavior internally.

    Cloud-init marks provisioning as failed only when it exits with a non-zero status. However, package installation errors do not always cause cloud-init to terminate in a failure state. In certain scenarios, cloud-init can log module-level errors (including package installation issues) but still complete overall execution successfully.

    Since Azure CycleCloud/Jetpack determines node readiness based on the completion status of cloud-init, a node may be marked healthy if cloud-init exits successfully—even if a specific dependency later fails at runtime (for example, a Python module import).

    To address this, you added an explicit validation step in the runcmd section to verify that the tqdm module can be imported. If the import fails, the provisioning workflow now exits with a non-zero status and logs an error through Jetpack, preventing the node from entering service in an inconsistent state. After implementing this change, the issue has not reoccurred.

    At this time, the originally affected node is no longer available, and since the issue has not reproduced, we are unable to review the cloud-init status artifacts and logs from the failing node to determine the exact underlying root cause.

    If the issue occurs again, collecting the cloud-init status output (such as the JSON summary) and relevant cloud-init logs from the affected node will allow for deeper analysis and confirmation of the specific failure condition.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.