Edit

Azure CycleCloud version 8.9.0

This release introduces new features, addresses issues, and improves overall performance.

New features

  • Azure CycleCloud supports submitting Azure Guest Health Reports for unhealthy HPC nodes from the UI and CLI. See the linked guide to enable this feature.

  • Azure CycleCloud reports a Spot Placement Score for node array buckets via the spotPlacementScore field in the cluster status REST API, based on the Azure Spot Placement Score API. To enable this feature, set cyclecloud.spot_placement_score_enabled=true.

  • Azure CycleCloud now supports the ip-XXXXXX hostname format for cluster nodes without requiring the legacy Chef framework.

  • Cluster-Init now provides the project version for use in Cluster-Init scripts through the CYCLECLOUD_PROJECT_VERSION environment variable.

  • Azure CycleCloud Slurm cluster changes:

    • Support for Slurm 25.11, including the TopologyParam=BlockAsNodeRank feature in slurm.conf.
    • Support for Azure Linux 3 images. Azure Linux 3 support requires custom VM images with the Slurm binary packages preinstalled.
    • Support for Rocky Linux 8/9 images.
    • Cluster and job metrics collection is available using the bundled azslurm-exporter.
    • A Slurm metrics dashboard is installed in the configured Azure Monitor Workspace.
    • When Slurm accounting is enabled, a default certificate bundle is installed for use with Azure Database for MySQL - Flexible Server.
    • Users can provide a custom certificate for the Slurm accounting database directly in the cluster creation UI rather than supplying a certificate URL.
    • The azslurm CLI includes restart and reimage commands to assist with issue remediation for nodelists.
  • Azure CycleCloud UI changes:

    • Restart and reimage actions are now available in the Node Actions menu.
    • A warning is displayed when editing settings that have Preview status.
    • The Node Details panel has a new Disks section, responsive multi-column layout, and accessible Sparkline charts replacing legacy Dojo charts (still available on a separate Monitoring tab).
    • The deprecated and retired BeeGFS cluster type has been removed from the available Azure CycleCloud cluster types.
    • Node health issues are published to the Activity Log tab of the Azure CycleCloud Cluster page to assist with tracking issue start and resolution times.
  • Azure CycleCloud CLI changes:

    • Bundles Astral Python for greater platform portability.
  • Jetpack CLI changes:

    • The Jetpack local HTTP server port can be (re-)configured via the cyclecloud.jetpack.http_port configuration setting.
    • vm.hostname and vm.ipv4 are available through the jetpack props command.
  • cycle_server CLI changes:

    • import_data action to assist with command-line and scripted import of records to the CycleCloud datastore.
    • settings action to assist with command-line and scripted changes to CycleCloud Settings.

Resolved issues

  • In clusters configured to install enroot, the enroot startup script incorrectly modified root directory permissions if ENROOT_TEMP_PATH was missing in enroot.conf.
  • In Azure CycleCloud Slurm clusters, the start-services.sh script didn't always exit with a non-zero exit code—potentially leaving services in a failed state rather than retrying as expected.
  • The cluster page Monitoring tab didn't have a vertical scrollbar when the window was resized, which caused graphs to be partially hidden.
  • The cycle_server CLI sometimes failed with NullPointerException for errors during CycleCloud start-up.
  • Azure CycleCloud .jar files produced by Microsoft were unsigned.
  • Actions hidden from menus weren't executable.
  • The Support dialog details text box didn't always expand to fill the available space.
  • The cyclecloud-slurm project configured JWT auth (and potentially failed cluster startup) as part of configuring slurmrestd even when JWT auth wasn't used.
  • Custom theme CSS didn't apply properly when custom themes were enabled.
  • The Azure CycleCloud ReturnProxy feature didn't re-establish connections to proxy nodes that were deallocated and later restarted.
  • Restart and reimage operations weren't allowed on nodes with KeepAlive set to true.
  • The Azure CycleCloud CLI defaulted to the public authentication endpoint when certain Entra authentication arguments weren't provided.
  • Nodes weren't terminated when a SKUNotAvailable error occurred because it was treated as an orchestration error rather than a capacity error.
  • Windows nodes failed to converge due to missing Cluster-Init path.
  • On-boot Node Healthchecks didn't return a useful error message on failure.
  • Failures to delete a single Virtual Machine Scale Set sometimes blocked Azure CycleCloud from processing other Virtual Machine Scale Set deletions in the same cluster.
  • Resolved CVE-2020-10683.
  • Resolved CVE-2023-39017.

Note

The Azure CycleCloud Open PBS and Single VM clusters now default to UsePublicNetwork=false. To enable public networking, set the UsePublicNetwork=true when creating the cluster.

Known issues

  • Azure CycleCloud Slurm packages for Azure Linux 3 aren't yet published to the public repositories. Users may build and install Slurm in their custom image following the Slurm documentation.