Cyclecloud - Persistent connection Slurm accounting - connection refuse to my scheduler

Jason Gumarang 20 Reputation points
2025-04-15T06:53:41.58+00:00

hi Support,

this is the error i am getting from my newly scheduler. I was able to deploy without any issue before to my new cluster but since the upgrade to cyclecloud 8, it seems newly created cluster has this issue whenever i enable the accounting.

I am not sure sure if there is a need to update in the jetpack or something as currently i am always getting this connection refuse.

Actions: Already tried to add port 6819 and 6817 in my network security group but no luck at all.


[2025-04-15T06:42:14.863] slurmscriptd: debug:  _slurmscriptd_mainloop: started

[2025-04-15T06:42:14.863] debug:  slurmctld: slurmscriptd fork()'d and initialized.

[2025-04-15T06:42:14.863] debug:  _slurmctld_listener_thread: started listening to slurmscriptd

[2025-04-15T06:42:14.863] slurmctld version 24.05.4 started on cluster jason-test

[2025-04-15T06:42:14.863] cred/munge: init: Munge credential signature plugin loaded

[2025-04-15T06:42:14.863] select/cons_tres: init: select/cons_tres loaded

[2025-04-15T06:42:14.864] select/linear: init: Linear node selection plugin loaded with argument 20

[2025-04-15T06:42:14.864] debug:  MPI: Loading all types

[2025-04-15T06:42:14.865] debug:  mpi/pmix_v4: init: PMIx plugin loaded

[2025-04-15T06:42:14.865] debug:  mpi/pmix_v4: init: PMIx plugin loaded

[2025-04-15T06:42:14.866] debug:  _plugrack_foreach: serializer plugin type:serializer/json path:/usr/lib64/slurm/serializer_json.so

[2025-04-15T06:42:14.866] debug:  _plugrack_foreach: serializer plugin type:serializer/url-encoded path:/usr/lib64/slurm/serializer_url_encoded.so

[2025-04-15T06:42:14.866] accounting_storage/slurmdbd: init: Accounting storage SLURMDBD plugin loaded

[2025-04-15T06:42:14.867] error: _open_persist_conn: failed to open persistent connection to host:jason-test-scheduler:6819: Connection refused

[2025-04-15T06:42:14.867] error: Sending PersistInit msg: Connection refused

[2025-04-15T06:42:14.867] accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd

[2025-04-15T06:42:14.867] error: Sending PersistInit msg: Connection refused

[2025-04-15T06:42:14.867] debug:  Association database appears down, reading from state file.

[2025-04-15T06:42:14.867] debug:  create_mmap_buf: Failed to open file /sched/jason-test/spool/slurmctld/last_tres, No such file or directory

[2025-04-15T06:42:14.867] debug:  create_mmap_buf: Failed to open file /sched/jason-test/spool/slurmctld/assoc_mgr_state, No such file or directory

[2025-04-15T06:42:14.867] fatal: You are running with a database but for some reason we have no TRES from it.  This should only happen if the database is down and you don't have any state files.

[2025-04-15T06:42:14.868] slurmscriptd: debug:  _slurmscriptd_mainloop: finished

Azure CycleCloud
Azure CycleCloud
A Microsoft tool for creating, managing, operating, and optimizing high-performance computing (HPC) and big compute clusters in Azure.
68 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.