Hello,
I'm trying to setup my first CycleCloud cluster, but I keep getting error in the initialization phase.
In particular, it complains about not finding nodes for "nodearray hpc".
The full error message:
CycleCloud Version: 8.2.0-1616
Cluster: Test2 (version 8.2.x)
==============================
Status: Error [Software Configuration] (retrying)
Start Time: 2022-03-13T17:31:29.377Z
Description: Unable to execute command `"bash" "/tmp/chef-script20220313-15831-14pjjqq"` (exit code 1)
Detail:
STDOUT:
STDERR: Upgrade not required!
Bucket has a max_count <= 0, defined for machinetype=='Standard_F2s_v2'. Skipping
/opt/cycle/slurm/cyclecloud_slurm.py:571: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
logging.warn("No nodes were created for nodearray %s using name format %s and offset %s: %s", request_set.nodearray, request_set.name_format,
No nodes were created for nodearray hpc using name format hpc-pg0-%d and offset 1: Limited by 200 total cores (10 of Standard_D4_v2) quota in eastus
Bucket has a max_count <= 0, defined for machinetype=='Standard_F2s_v2'. Skipping
Unhandled failure.
Traceback (most recent call last):
File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1101, in <module>
main()
File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1078, in main
args.func(**kwargs)
File "/opt/cycle/slurm/cyclecloud_slurm.py", line 296, in generate_slurm_conf
_generate_slurm_conf(partitions, writer, subprocess)
File "/opt/cycle/slurm/cyclecloud_slurm.py", line 222, in _generate_slurm_conf
raise RuntimeError("No nodes found for nodearray %s. Please run 'cyclecloud_slurm.sh create_nodes' first!" % partition.nodearray)
RuntimeError: No nodes found for nodearray hpc. Please run 'cyclecloud_slurm.sh create_nodes' first!
Traceback (most recent call last):
File "/opt/cycle/jetpack/system/embedded/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/cycle/jetpack/system/embedded/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1101, in <module>
main()
File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1078, in main
args.func(**kwargs)
File "/opt/cycle/slurm/cyclecloud_slurm.py", line 296, in generate_slurm_conf
_generate_slurm_conf(partitions, writer, subprocess)
File "/opt/cycle/slurm/cyclecloud_slurm.py", line 222, in _generate_slurm_conf
raise RuntimeError("No nodes found for nodearray %s. Please run 'cyclecloud_slurm.sh create_nodes' first!" % partition.nodearray)
RuntimeError: No nodes found for nodearray hpc. Please run 'cyclecloud_slurm.sh create_nodes' first!
EXCEPTION: bash[Create cyclecloud.conf] (slurm::scheduler line 156) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
Affected Nodes (1):
---
Node Name: scheduler
Hostname: ip-0A000005
IP Address: 10.0.0.5
Azure Resource ID: /subscriptions/8e7cef8b-5b7e-469a-8dd8-4cb4835f1727/resourceGroups/Test2-MEYTCMLBMU2GMLJQGU4DCLJUGA/providers/Microsoft.Compute/virtualMachines/scheduler-MIZGGNZQGRQWGLJVGUYGCLJUGB
Azure VM ID: df70aba9-e8a0-4f15-8d86-764423838920
Cluster-Init: slurm:default:2.4.7, slurm:scheduler:2.4.7
Node ID: be007f51-9d51-4a42-ae4b-0eab606c563a
Any suggestion on what could be the cause?
My cluster configuration is the following:
- cycle cloud 8.2
- slurm as cluster manager
- Standard_D12_v2 as instance for the Scheduler, Standard_D4_v2 for HPC instances and Standard_F2s_v2 for HTC instances
- I've selected 16 as maximum HPC cores (I wanted to have just two machines)
- the image selected is Centos 7 for all the different instances
Thank you for your support!