Error in creating cyclecloud cluster (No nodes found for nodearray hpc)

De Matteis Tiziano 26 Reputation points
2022-03-13T17:46:55.11+00:00

Hello,

I'm trying to setup my first CycleCloud cluster, but I keep getting error in the initialization phase.

In particular, it complains about not finding nodes for "nodearray hpc".

The full error message:

CycleCloud Version: 8.2.0-1616
Cluster: Test2 (version 8.2.x)
==============================

Status: Error [Software Configuration] (retrying)
Start Time: 2022-03-13T17:31:29.377Z

Description: Unable to execute command `"bash"  "/tmp/chef-script20220313-15831-14pjjqq"` (exit code 1)

Detail: 
STDOUT: 
STDERR: Upgrade not required!
Bucket has a max_count <= 0, defined for machinetype=='Standard_F2s_v2'. Skipping
/opt/cycle/slurm/cyclecloud_slurm.py:571: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
  logging.warn("No nodes were created for nodearray %s using name format %s and offset %s: %s", request_set.nodearray, request_set.name_format,
No nodes were created for nodearray hpc using name format hpc-pg0-%d and offset 1: Limited by 200 total cores (10 of Standard_D4_v2) quota in eastus
Bucket has a max_count <= 0, defined for machinetype=='Standard_F2s_v2'. Skipping
Unhandled failure.
Traceback (most recent call last):
  File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1101, in <module>
    main()
  File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1078, in main
    args.func(**kwargs)
  File "/opt/cycle/slurm/cyclecloud_slurm.py", line 296, in generate_slurm_conf
    _generate_slurm_conf(partitions, writer, subprocess)
  File "/opt/cycle/slurm/cyclecloud_slurm.py", line 222, in _generate_slurm_conf
    raise RuntimeError("No nodes found for nodearray %s. Please run 'cyclecloud_slurm.sh create_nodes' first!" % partition.nodearray)
RuntimeError: No nodes found for nodearray hpc. Please run 'cyclecloud_slurm.sh create_nodes' first!
Traceback (most recent call last):
  File "/opt/cycle/jetpack/system/embedded/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/cycle/jetpack/system/embedded/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1101, in <module>
    main()
  File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1078, in main
    args.func(**kwargs)
  File "/opt/cycle/slurm/cyclecloud_slurm.py", line 296, in generate_slurm_conf
    _generate_slurm_conf(partitions, writer, subprocess)
  File "/opt/cycle/slurm/cyclecloud_slurm.py", line 222, in _generate_slurm_conf
    raise RuntimeError("No nodes found for nodearray %s. Please run 'cyclecloud_slurm.sh create_nodes' first!" % partition.nodearray)
RuntimeError: No nodes found for nodearray hpc. Please run 'cyclecloud_slurm.sh create_nodes' first!
EXCEPTION: bash[Create cyclecloud.conf] (slurm::scheduler line 156) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'


Affected Nodes (1):
---
Node Name: scheduler
Hostname: ip-0A000005
IP Address: 10.0.0.5
Azure Resource ID: /subscriptions/8e7cef8b-5b7e-469a-8dd8-4cb4835f1727/resourceGroups/Test2-MEYTCMLBMU2GMLJQGU4DCLJUGA/providers/Microsoft.Compute/virtualMachines/scheduler-MIZGGNZQGRQWGLJVGUYGCLJUGB
Azure VM ID: df70aba9-e8a0-4f15-8d86-764423838920
Cluster-Init: slurm:default:2.4.7, slurm:scheduler:2.4.7
Node ID: be007f51-9d51-4a42-ae4b-0eab606c563a

Any suggestion on what could be the cause?

My cluster configuration is the following:

  • cycle cloud 8.2
  • slurm as cluster manager
  • Standard_D12_v2 as instance for the Scheduler, Standard_D4_v2 for HPC instances and Standard_F2s_v2 for HTC instances
  • I've selected 16 as maximum HPC cores (I wanted to have just two machines)
  • the image selected is Centos 7 for all the different instances

Thank you for your support!

Azure CycleCloud
Azure CycleCloud
A Microsoft tool for creating, managing, operating, and optimizing high-performance computing (HPC) and big compute clusters in Azure.
68 questions
{count} vote

Accepted answer
  1. vipullag-MSFT 26,487 Reputation points Moderator
    2022-03-14T11:26:05.83+00:00

    @De Matteis Tiziano

    Thanks for reaching out on Microsoft Q&A Platform.

    You can use a different SKU and test the configuration. Besides that D series v2 is quite outdated, v3 and v4 are available, can you try with a more updated D series size?

    Hope this helps.
    Please 'Accept as answer' if it helped, so that it can help others in the community looking for help on similar topics.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.