Monitoring

Azure CycleCloud supports monitoring of external services through its pluggable architecture. Administrators can enable automatic monitoring of these systems going to the Settings page under the user menu in the top right-hand corner of the web interface, double-clicking the CycleCloud settings item, and checking the box labelled Enable monitoring for CycleCloud services.

When this option is enabled, supported services in each cluster will automatically register with CycleCloud, which will configure monitoring for that service.

Supported Services

Ganglia

Every version of CycleCloud ships with Ganglia monitoring support for collecting performance metrics such as cpu/memory/bandwidth usage. If your cluster is configured to use Ganglia (the default in most cases), automatic monitoring will work as long as port 8652 is open between CycleCloud and the cluster's master node (the one running the gmetad service).

Ganglia on CentOS/RHEL

Ganglia on CentOS and RHEL is provided by EPEL. Azure CycleCloud configures and installs EPEL, and the Ganglia dependencies, by default.

One may choose to opt out of using EPEL by setting cyclecloud.install_epel = false in a cluster template. Opting out of EPEL will skip Ganglia monitoring setup. This will not impact the computational functionality of your compute cluster, but will forego data that would have been collected for the reports view of your cluster.

For informational purposes, here are the "client" dependencies installed on execute cluster nodes, and the "server" dependencies installed on master/head cluster nodes.

# Ganglia client dependencies from CentOS/RHEL base
yum -y install apr bash expat glibc pcre python python-libs systemd zlib

# Ganglia client dependencies provided by EPEL
yum -y install ganglia ganglia-gmond ganglia-gmond-python libconfuse

# Ganglia server dependencies from CentOS/RHEL base
yum -y install apr bash expat glibc libmemcached pcre rrdtool systemd zlib

# Ganglia server dependencies provided by EPEL
yum -y install ganglia ganglia-gmetad libconfuse

Grid Engine

If you are running the Grid Scheduling Edition of CycleCloud, Grid Engine monitoring will automatically be configured when a Grid Engine cluster is started. The only requirement is that CycleCloud can SSH to the node running the qmaster service with the keypair configured for the cluster.

Azure Monitor

As of CycleCloud 8.0, metrics for a cluster are pulled from Azure Monitor instead of Ganglia. This removes the need to open port 8652 inbound on nodes.

Note

Even clusters that are still at version 7 and have Ganglia pre-installed will get their metrics from Azure Monitor in CycleCloud 8.

The metrics that are collected are:

  • Percentage CPU
  • Disk Read Bytes
  • Disk Write Bytes
  • Network In
  • Network Out

It's also possible to store log data from CycleCloud clusters to Log Analytics and create custom metrics dashboards. For more information on creating custom metrics dashboards from Log Analytics for your clusters, see the How-to section and the tutorials in the Azure Monitor documentation.