Azure CycleCloud Web UI stops responding every few days

Gary Mansell 0 Reputation points
2024-04-09T13:47:32.44+00:00

Hi,

I am running a few Slurm Clusters on Azure CycleCloud 8.6, using the fully updated CentOS 7.9 (I know it drops off support soon) platform image from the Marketplace.

Every few days, the CycleCloud Web UI stops responding and I seem to have to restart the CycleCloud server VM to get it to come back.

Does anyone know what might be causing this and, if not, how to debug it?

I have had a look in the /opt/cycle_server/logs and the cycle_server.log file does not give any clues, whereas the catalina.err has a load of the following entries, that may be relevant?

2024-04-09 06:19:56 AM INFO [org.restlet] - Couldn't find the mandatory "Host" HTTP header.
2024-04-09 06:19:56 AM WARNING [org.restlet] - Error while handling an HTTP server call:
2024-04-09 06:19:56 AM INFO [org.restlet] - Error while handling an HTTP server call
java.lang.NullPointerException

2024-04-09 06:20:01 AM WARNING [org.restlet] - Unable to parse the HTTP request
java.io.IOException: Unable to parse the request method. End of stream reached too early.
        at com.noelios.restlet.http.HttpServerCall.readRequestHead(HttpServerCall.java:347)
        at com.noelios.restlet.http.StreamServerCall.<init>(StreamServerCall.java:88)
        at com.noelios.restlet.http.StreamServerHelper$ConnectionHandler.run(StreamServerHelper.java:86)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

When I run the below command it looks like the web server service has crashed:

[root@azu-nemo-srv logs]#  systemctl | grep cycle
  opt-cycle_server.mount
                           loaded active mounted   /opt/cycle_server
  cycle_server.service
                           loaded active exited    CycleCloud
  

But, otherwise I am a stuck as to what might be causing this?

Azure CycleCloud
Azure CycleCloud
A Microsoft tool for creating, managing, operating, and optimizing high-performance computing (HPC) and big compute clusters in Azure.
59 questions
{count} votes

1 answer

Sort by: Most helpful
  1. vipullag-MSFT 24,206 Reputation points Microsoft Employee
    2024-04-15T04:07:09.7233333+00:00

    Hello Gary Mansell

    Welcome to Microsoft Q&A Platform, thanks for posting your query here.

    One reason could be the root volume getting filled up and could be causing this issue.

    Just make sure this is not the case.

    Hope this helps.