Docker Task in HPC Pack

HPC Pack docker task is the task running in a docker container.

To use this feature, set the task environment variable CCP_DOCKER_IMAGE to indicate a docker image, which will be used to start a docker container to run the task. The format is like: CCP_DOCKER_IMAGE=[Docker Registry]<Repository>[:Tag]

Besides, there are several environment variables could also be used to enhance this feature.

  • CCP_DOCKER_NVIDIA (Linux only) to indicate if using command nvidia-docker, instead of using docker, to start docker container. For example, CCP_DOCKER_NVIDIA=1.

  • CCP_DOCKER_VOLUMES to set the directories to be mounted from host to docker container as volumes. For example, CCP_DOCKER_VOLUMES=/host_path1:/container_path1,/common_path,/host_path2:/container_path2:z or CCP_DOCKER_VOLUMES=c:\foo:c:\dest,c:\foo:d:.

  • CCP_DOCKER_DEBUG to indicate if leaving the container alive for debugging after the command in it finishes, the container needs to be removed manually later. For example, CCP_DOCKER_DEBUG=1.

  • CCP_DOCKER_START_OPTION to add additional options when starting a docker container. For example, CCP_DOCKER_START_OPTION=--network=host --ulimit memlock=-1.

  • CCP_DOCKER_SKIP_SSH_SETUP (Linux only) to indicate if skip the default way (use the SSH keys and network of host in container, stop SSH server on host and start SSH server in container) to setup SSH communication between containers, which should be set if the docker image handles this. For example, CCP_DOCKER_SKIP_SSH_SETUP=1.

To run docker task, the docker application should be installed on the Windows/Linux compute nodes as prerequisite.

When a docker task being allocated with multiple Linux nodes to run MPI application, no other MPI docker task should use these nodes simultaneously because the container on each Linux node shares network with its host. Running MPI application in docker task on Windows compute nodes is not supported yet.

The Linux OS in docker image should has /bin/bash.

To run MPI application in docker containers on Linux nodes, the docker image should has sudo, SSH service and MPI installed.

Run docker task on Linux compute nodes step by step

Deploy cluster with ARM template

Install docker on Linux compute nodes

  • Install Docker CE following docker docs with clusrun

    clusrun /nodegroup:linuxnodes /interleaved yum -y update
    clusrun /nodegroup:linuxnodes /interleaved yum install -y yum-utils device-mapper-persistent-data lvm2
    clusrun /nodegroup:linuxnodes /interleaved yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
    clusrun /nodegroup:linuxnodes /interleaved yum install -y docker-ce docker-ce-cli containerd.io
    clusrun /nodegroup:linuxnodes systemctl start docker
    

    In this doc, the version of Docker CE is 17.11.0-ce-rc3.

Run command in container as docker task

  • Submit a job containing 1 docker task

    job submit /env:ccp_docker_image=docker.io/library/ubuntu:16.04 hostname
    

    Check job result in HPC Pack Cluster Manager:

    dockertask1

  • Submit a job containing multiple docker tasks

    job new
    job add !! /env:ccp_docker_image=ubuntu cat /etc/*release ^| grep ^^NAME
    job add !! /env:ccp_docker_image=centos cat /etc/*release ^| grep ^^NAME
    job add !! /env:ccp_docker_image=debian cat /etc/*release ^| grep ^^NAME
    job add !! /env:ccp_docker_image=fedora cat /etc/*release ^| grep ^^NAME
    job submit /id:!!
    

    Check job result in HPC Pack Cluster Manager:

    dockertask2

  • Tasks would inherit the environment variables of their job if they don't have the same ones, thus the docker image can also be assigned in job environment variables

    job new /jobenv:ccp_docker_image=ubuntu
    job add !! hostname^; cat /etc/*release ^| grep ^^NAME
    job add !! hostname^; cat /etc/*release ^| grep ^^NAME
    job add !! hostname^; cat /etc/*release ^| grep ^^NAME
    job add !! /env:ccp_docker_image=centos hostname^; cat /etc/*release ^| grep ^^NAME
    job submit /id:!!
    

    Check job result in HPC Pack Cluster Manager:

    dockertask3

Run MPI docker task

Build customized docker image with MPICH installed

Perform this step in any Linux node with docker installed.

  • Start a container with docker image ubuntu:

    docker run -it ubuntu
    
  • Install sudo, ssh, vim and mpich with apt-get:

    apt update; apt -y install sudo ssh vim mpich
    
  • Write a simple MPI program:

    mkdir /mpisample
    chmod o+w /mpisample
    cd /mpisample
    vim helloMpi.c
    
  • Edit helloMpi.c with below content:

    #include<mpi.h>
    #include<stdio.h>
    
    int main(int argc, char** argv)
    {
        int rank, size, processor_name_length;
        char processor_name[1000];
        MPI_Init(NULL, NULL);
        MPI_Comm_rank (MPI_COMM_WORLD, &rank);
        MPI_Comm_size (MPI_COMM_WORLD, &size);
        MPI_Get_processor_name(processor_name, &processor_name_length);
        printf("Hello from %s, rank %d out of %d processors.\n", processor_name, rank, size);
        MPI_Finalize();
    }
    
  • Compile helloMpi.c with mpicc and create a shell script run.sh:

    mpicc helloMpi.c -o helloMpi
    vim run.sh
    
  • Edit run.sh with below content:

    #!/bin/bash
    echo $CCP_NODES | tr " " "\n" | sed "1d;n;d" | cat > host_file
    num=$1
    [ -z "$num" ] && num=4
    mpirun -n $num -f host_file ./helloMpi
    
  • Set the execution permission of run.sh and exit the docker container:

    chmod +x run.sh
    exit
    
  • Commit and push the docker image to docker hub:

    docker commit $(docker ps -qa -n 1) <docker hub account>/mpich
    docker login -u <docker hub account> -p <password>
    docker push <docker hub account>/mpich
    

    A docker hub account is needed to perform this operation.

  • This is the way to build docker image manually, alternative is using Dockerfile.

Run MPI task

  • Submit a job including a docker task to run the MPI application in the docker image we built in above step:

    job submit /env:ccp_docker_image=<docker hub account>/mpich /numnodes:4 /workdir:/mpisample ./run.sh 16
    

    mpitask1

Run MPI docker task with docker image as SSH server

Create docker image as SSH server

  • Create Dockerfile for building a docker image with Ubuntu containing SSH keys for root user, which will be started as a SSH server with port 3022:

    FROM ubuntu
    
    RUN apt-get update
    RUN apt-get install -y sudo ssh mpich
    
    RUN ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa
    RUN cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
    RUN echo "Port 3022" >> /root/.ssh/config
    
    RUN mkdir /run/sshd
    
    ENTRYPOINT ["/usr/sbin/sshd", "-D", "-p", "3022"]
    
  • Build and push the docker image to docker hub:

    docker build -t <docker hub account>/ubuntu_mpich_as_ssh_server .
    docker login -u <docker hub account> -p <password>
    docker push <docker hub account>/ubuntu_mpich_as_ssh_server
    

Run MPI tasks

  • Submit a job including a docker task to run MPI application in the docker image we built in above step with command mpirun -machinefile $CCP_MPI_HOSTFILE hostname and environments CCP_DOCKER_IMAGE=<docker hub account>/ubuntu_mpich_as_ssh_server,CCP_MPI_HOSTFILE_FORMAT=1,CCP_DOCKER_SKIP_SSH_SETUP=1,CCP_DOCKER_START_OPTION=--network=host :

    mpitask2

  • Check task output

    mpitask3

Run docker task on Windows compute node step by step

Add Windows compute node into cluster

Run command in container as docker task

  • Submit a job containing a docker task allocated to Windows compute node

    job submit /requestednodes:IaaSWinCN000 /env:CCP_DOCKER_IMAGE=mcr.microsoft.com/windows/servercore:ltsc2016 ping -t localhost
    
  • Peek task output

    windowstask2

  • Cancel the job

    job cancel !!