Hi Support,
Since we merge this with my other post (https://learn.microsoft.com/en-us/answers/questions/2245815/slurm-scheduler-cant-be-access-via-ssh-or-bastion?page=1#answer-2024996), i would suggest that we will focus first in the issue on the missing /share/home after reboot.
It seems that there was an issue running the jetpack run.
[root@jason-test-scheduler ~]# jetpack run --parts all
Usage: jetpack [OPTIONS] COMMAND [ARGS]...
Error: No such command "run".
Per system, these are the only arguments that are available:
Commands:
autoscale Autoscales the cluster the node is part of
config Gets a configuration value
converge Triggers a converge on the current node
download Download a project file from cloud storage
keepalive Delays the healthcheck daemon
log Sends log message to CycleCloud
report_issue Upload logs to Azure Storage.
run_on_shutdown Add a script to be called prior to node termination
send Sends a message to CycleCloud
shutdown terminates the node
test Runs python style unittests on the node
users
Here are some logs:
2025-04-24 08:35:15,207 INFO Getting node configuration from /clusterlink/userdata/ea3a7cc839e2133c72eba8176d59cfe62025-04-24 08:35:15,246 INFO Scheduled event monitor thread started
2025-04-24 08:35:15,246 INFO Forking jetpack converge
to configure VM software
2025-04-24 08:37:32,670 INFO Software configuration complete!
2025-04-24 08:37:43,182 INFO Getting latest auth configuration from /clusterlink/auth/ea3a7cc839e2133c72eba8176d59cfe6
2025-04-24 08:38:43,195 INFO Getting latest auth configuration from /clusterlink/auth/ea3a7cc839e2133c72eba8176d59cfe6
2025-04-24 08:39:43,207 INFO Getting latest auth configuration from /clusterlink/auth/ea3a7cc839e2133c72eba8176d59cfe6
2025-04-24 08:40:43,220 INFO Getting latest auth configuration from /clusterlink/auth/ea3a7cc839e2133c72eba8176d59cfe6
2025-04-24 08:41:14,285 INFO Auto-acknowledging EventID D0B8F90C-4225-4EF8-B3CA-9A13B0C9B6BB for resources ['scheduler-GRSTQYTCMI3TQLJVGUYWMLJUGA'] (Reboot)
2025-04-24 08:41:43,233 INFO Getting latest auth configuration from /clusterlink/auth/ea3a7cc839e2133c72eba8176d59cfe6
2025-04-24 08:42:02,494 ERROR Failed to get instance status from CycleCloud
Traceback (most recent call last):
File "/opt/cycle/jetpack/system/python/lib/python3.12/site-packages/jetpack/service.py", line 23, in initialize
config = jetpack.util.get_current_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/cycle/jetpack/system/python/lib/python3.12/site-packages/jetpack/util/init.py", line 604, in get_current_config
r = query_cyclecloud(url, params=params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/cycle/jetpack/system/python/lib/python3.12/site-packages/jetpack/util/init.py", line 577, in query_cyclecloud
conn.request(method, path, body=body, headers=default_headers)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1336, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1382, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1331, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1091, in _send_output
self.send(msg)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1035, in send
self.connect()
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1470, in connect
super().connect()
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1001, in connect
self.sock = self._create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/cycle/jetpack/system/python/lib/python3.12/socket.py", line 865, in create_connection
raise exceptions[0]
File "/opt/cycle/jetpack/system/python/lib/python3.12/socket.py", line 850, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
2025-04-24 08:42:02,511 ERROR Failed to get instance status from CycleCloud
Traceback (most recent call last):
File "/opt/cycle/jetpack/system/python/lib/python3.12/site-packages/jetpack/service.py", line 23, in initialize
config = jetpack.util.get_current_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/cycle/jetpack/system/python/lib/python3.12/site-packages/jetpack/util/init.py", line 604, in get_current_config
r = query_cyclecloud(url, params=params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/cycle/jetpack/system/python/lib/python3.12/site-packages/jetpack/util/init.py", line 577, in query_cyclecloud
conn.request(method, path, body=body, headers=default_headers)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1336, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1382, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1331, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/cycle/jetpack/system/python/lib/python3.12/http/client.py", line 1091, in _send_output
ConnectionRefusedError: [Errno 111] Connection refused
2025-04-24 08:44:11,673 WARNING While querying instance metadata, encountered <urlopen error [Errno 110] Operation timed out> retrying in 2 seconds
2025-04-24 08:44:13,716 INFO Scheduled event monitor thread started
2025-04-24 08:44:13,720 ERROR The home directory mounted at /shared/home is not available
2025-04-24 08:45:13,723 ERROR The home directory mounted at /shared/home is not available
2025-04-24 08:46:13,727 ERROR The home directory mounted at /shared/home is not available
2025-04-24 08:47:13,730 ERROR The home directory mounted at /shared/home is not available
2025-04-24 08:48:13,733 ERROR The home directory mounted at /shared/home is not available
2025-04-24 08:49:13,737 ERROR The home directory mounted at /shared/home is not available
2025-04-24 08:50:13,740 ERROR The home directory mounted at /shared/home is not available
2025-04-24 08:51:13,744 ERROR The home directory mounted at /shared/home is not available