Issue with Autoscaling Nodes in Azure CycleCloud SLURM Cluster
I have created a SLURM cluster on Azure CycleCloud and enabled autoscaling with a maximum of 20 nodes for HPC. I have verified that there is enough quota for at least 10 HPC and 2 HTC nodes. However, upon booting the cluster, only 5 nodes are available…
CycleCloud provisioning issue - storage_account_name
Hi everyone, Can you please help to check and advise below issue that provisioned for CycleCloud Version: 8.2.1-1733: CycleCloud Version: 8.2.1-1733 Cluster: HTCondor-Cluster01 (version 8.2.x) Status: Error [Virtual Machine] Start Time:…
cycle cloud issue with accessing storage account
I am trying to bring up cycle cloud using the storage account "cycleclouddata", all resources are on the same vnet/sub-net and there should not be any access problems as the setup of the locker all looks fine as well. And I can see files being…
Parallel cyclecloud slrum process stacked while reading input file
I am running a parallel job, using openmpi, on cycle cloud, using slrum (batch) on 5 nodes, 120 cores in each. The job starts with reading the computational mesh, by each core. While reading, a mesh lock file appears for each core and disappears when it…
How to update scheduler and execute nodes with latest OS updates?
Hi, We need to ensure that both our Scheduler and Execute nodes are fully patched and up to date, including kernel packages and others that require a reboot after installation. We are running the Microsoft HPC Ubuntu 22.04 image, and I know it is…
How to make groups from the CycleCloud (file) Server accessible to the Scheduler and Execute nodes?
Hi, I have a CycleCloud Server which is also configured as an NFS fileserver which presents home, data and apps directories via NFS to the CycleCloud scheduler and execute nodes. It seems that CycleCloud adds users and groups to the scheduler &…
Azure cyclecloud
Hi Team, I want to automate the azure cyclecloud deployment process through terraform or other tool can anyone help me in this? I am able to create the cycle cloud server but unable to do azure cyclecloud setup through terraform or in any other language…
Azure CycleCloud 8.6 not available in Marketplace
Hi, after the CVE-2024-29993 I wanted to deploy the new version of CycleCloud version 8.6.1 but it is no longer in the Marketplace
Azure Cyclecloud Config file does not match number of CPUs on a node in each cluster.
Hi everyone. I am using slurm to run a script on Azure Cyclecloud and the script uses all of the cores. When I run it on the cluster, it is only using half of the cores on the node. The cyclecloud.conf and slurm.conf files are only specifying 16 CPUs…
CycleCloud unable to access storage locker after disabling "Allow storage account key access"
We have several CycleCloud/Slurm clusters running in Azure using Managed Identity. A recent security initiative has required us to disable the "Allow storage account key access" storage account configuration. After doing so, CycleCloud is…
Is it possible (sensible?) to run Docker containers on Azure CycleCloud using Slurm?
I have been successfully running Azure CycleCloud & Slurm scheduler for running our HPC (CFD & CAE) Analysis Solving jobs from a /shared/apps loadpoint in a regular manner. I demo'd our HPC Solving capabilities to our Climate modelling team and…
Azure CycleCloud Web UI stops responding every few days
Hi, I am running a few Slurm Clusters on Azure CycleCloud 8.6, using the fully updated CentOS 7.9 (I know it drops off support soon) platform image from the Marketplace. Every few days, the CycleCloud Web UI stops responding and I seem to have to restart…
DeletingCloudOnlyObjectNotAllowed
Hello, I received this error "DeletingCloudOnlyObjectNotAllowed" multiple times a day. I'm not sure how to resolve it. I've been looking all over the places but still can't find the solution. Our on-premise Active Directory syncs with Azure…
How to set permissions on a cluster node's scratch disk filesystem so that users can write to it?
Hi, I have created a custom cluster that builds a nodearray with an attached local disk and mounts it to the node successfully on node startup and formats it (it is not persistent). How do I now set permissions on the /scratch filesystem - so that my…
Problem getting GPU solving to work with our Azure CycleCloud / Slurm HPC cluster System
I am using the Azure CycleCloud 8.4 Marketplace image and it is fully updated, along with Slurm version 22.05.8-1. I have configured a GPU Enabled Slurm Partition consisting of some NC24sv3 VMs (which have 4x Nvidia Tesla V100 GPUS in each), but the…
How to preconfigure new users using Active Directory at Cyclecloud
I configured the Azure Active Directory Domain Services as the Authentication method on Cyclecloud. When a new user logs into the server a new user is created without permissions. Can this process be configured somehow to add permissions to the new user…
CycleCloud/Cluster Configuration/Cluster Operations
600 cores (120 available) when I tried to add node I got this message: "Regional quota exceeded: Cannot add any more nodes."
Regional quota exceeded: Cannot add any more nodes.
Quota limit: 600 cores, 480 (4 nodes) already used and running, however to add an additional node (5th node) I got this message in CycleCloud GUI: Regional quota exceeded: Cannot add any more nodes. Also, I increased the quota in my subscription (Portal)…
Azure ML - Notebook - Jupyter Kernel Error - No Kernel connection
In ML Studio, when I create a notebook the top of my screen says "Jupyter kernel error" in red. I have a compute instance running (it's green), but it also says "No Kernel connected". To correct this matter, can you please…