Plan Your CycleCloud Production Deployment

Azure CycleCloud Deployment

Warning

Ensure that "Enable hierarchical namespace" for Azure Data Lake Storage Gen 2 is not set during storage account creation. CycleCloud is unable to use Blob storage with ADLS Gen 2 enabled as a storage Locker.

Azure CycleCloud Configuration

Azure CycleCloud Cluster Configuration

  • Define user access to the clusters Cluster User Management
  • Determine which scheduler will be used
  • Determine which SKU will be required for the scheduler/head node
  • Determine what SKUs will be required for the compute/execute nodes. This will be entirely dependent on the application being run
  • Will clusters be deployed using a template or manually?
  • Will any scripts need to be run on the scheduler or execute nodes once deployed:

Applications

  • What dependencies (libraries, etc) do the applications have? How will these be made available?
  • How long does an application take to setup and install? This may determine how an application is made available to the execute nodes and could necessitate a custom image.
  • Are there any license dependencies that need to be taken into account? Does the application need to contact an on-premise license server?
  • Determine where applications will be executed from, this will be dependent on install times and performance requirements:
  • Is there a specific VM SKU which will need to be used for the applications to run on? Will MPI be a requirement as that would necessitate a different family of machines like the H series?
  • What will be the optimum number of cores per job for each application?
  • Can spot VMs be used? Using Spot VMs in CycleCloud
  • Ensure subscription quotas are in place to fulfill the core requirements for the applications

Data

  • Determine where in Azure the input data will reside. This will be dependent on the performance of the applications and data size.
    • Locally on the execute nodes
    • From an NFS share
    • In blob storage
    • Using Azure NetApp Files
  • Determine if there is any post-processing needed on the output data
  • Decide where the output data will reside once processing is complete
  • Does it need to be copied elsewhere?
  • What archive/backup requirements are there?

Job Submission

  • How will users submit jobs?
  • Will they have a script to run on the scheduler VM or will there be a frontend to help with data upload and job submission?

Backup and Disaster Recovery

  • Will templates be used for cluster creation? This will make the recreation of a CycleCloud server a lot quicker and consistent across deployments
  • What requirements for Disaster Recovery are there? What would happen to the business if an Azure region wasn’t available as expected?
  • Are there any application SLAs defined by the internal business?
  • Could another region be used as a standby?
  • Are jobs long running? Would checkpointing be beneficial?