Cluster Hat

Setting up a 8086.net Cluster Hat purchased from Pimaroni

Hardware Setup

The Hardware set up is very simple.

The kit comes with:

  • 1 x ClusterHAT PCB
  • 4 x 12mm standoffs
  • 8 x screws
  • 4 x stick on feet
  • 1 x USB cable
  • First (making sure no power is connected to anything) attach the four standoffs with four of the screws to the Raspberry Pi 3.

    Second plug the USB cable in and thread it over the Pi the top of the pi coming out at the opposite end.

    Third Place the cluster hat on top of the Pi 3 and make sure it is firmly seated on the GPIO headers.

    Fourth user the four remaining screws to screw the cluster hat to the standoffs.

    Fifth connect the threaded usb cable to to the cluster hat.

    Sixth plug in your four Pi Zeros firmly to the cluster hat.

    Notes on power supply: I initially used a 2.5A supply but this was soon apparent to be lacking in power for this set up, so upgraded to a 5A supply. I am not sure that going beyond 5A would be safe or advisable.

    As a final step, inset the SD cards with the downloaded images. I chose the CNAT versions for ease of set up. In addition I connected a 1TB USB disk to the Pi 3 controller to be a shared file system for the cluster.

    You can get the images from here: ClusterCtrl Setup Software

    Controller Setup

    Next we will set up the software on the Pi 3 controller.

    Download the Clusterhat imaged from ClusterCtrl Setup Software page. Chooses the CNAT version for ease of set up.

    Boot up Pi, set password and localisation.

    Set the controller pi hostname to pi0 to be inline with the naming schema we are going to use

    sudo hostname pi0
    sudo vi /etc/hostname # change the hostname in this file
    sudo vi /etc/hosts # change ‘controller’ to pi0
    	

    Generate SSH key:

    ssh-keygen -t rsa -b 4096
    cat ~/.ssh/id_rsa.pub
    	

    Set up ssh names in ~/.ssh/config

    Host p1

  • Hostname 172.19.181.1
  • User pi
  • Host p2

  • Hostname 172.19.181.2
  • User pi
  • Host p3

  • Hostname 172.19.181.3
  • User pi
  • Host p4

  • Hostname 172.19.181.4
  • User pi
  • Next we set up a usb disk for shared data.

    Use lsblk to show device info

    sudo mkfs.ext4 /dev/sda1 (or sd** as found from lsblk
    sudo mkdir /media/Storage
    sudo chown nobody.nogroup -R /media/Storage
    sudo chmod -R 777 /media/Storage
    	

    Use blkid to find the UUID for fstab

    sudo vi /etc/fstab

    Add the following line to the bottom of the fstab file:

    UUID=a13c2fad-7d3d-44ca-b704-ebdc0369260e /media/Storage ext4 defaults 0 2
    	

    Install NFS server on p0:

    sudo apt-get install -y nfs-kernel-server
    	

    Update the exports file:

    /media/Storage 172.19.181.0/24(rw,sync,no_root_squash,no_subtree_check)
    sudo mount /media/Storage
    sudo exportfs -a
    	

    Setting up the Pi Zero Nodes

    Power on clusterhat with the command sudo clusterhat on - sudo clusterhat off switches it off again.

    Due to ssh not being enabled by default we must connect via serial to enable. The default credentials are: user: pi :pass: clusterctrl.

    Minicom -D /dev/ttypi1
    
    cd /boot/
    sudo touch ssh
    
    sudo apt-get install -y ntpdate
    
    (on p0 copy the pub key from ~/.ssh/id_rsa.pub)
    
    echo [paste the ssh key here] >> ~/.ssh/authorized_keys
    

    Reset the node password to what ever you want

    Repeat above for all 4 nodes and then we will be able to use ssh.

    Finally we set up nfs client on the 4 nodes

    ssh to each of the nodes in turn (p1, p2, p3, p4).

    sudo apt-get install -y nfs-common
    sudo mkdir /media/Storage
    sudo chown nobody.nogroup /media/Storage
    sudo chomod -R 777 /media.Storage
    
    sudo vi /etc/fstab
    
    172.19.181.254:/media/Storage /media/Storage nfs defaults 0 0
    
    Check /etc/hosts.deny for RPC
    
    sudo mount -a
    

    Setting up Slurm on the Controller

    Slurm is an HPC job control system, this what will farm the job out to the nodes and control the job queues.

    On the controller Pi 3 (hostname p0)

    sudo apt-get install -y slurm-wlm
    cd /etc/slurm-llnl
    sudo Cp /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz .
    sudo Gzip -d slurm.conf.simple.gz
    sudo Mv slurm.conf.simple slurm.conf
    
    sudo vi /etc/slurm-llnl/slurm.conf
    
    SlurmctldHost=p0(172.19.181.254)
    SelectType=select/cons_res
    SelectTypeParameters=CR_Core
    ClusterName=cluster
    NodeName=p0 NodeAddr=172.19.181.254 CPUs=2 Weight=2 State=UNKNOWN
    NodeName=p1 NodeAddr=172.19.181.1 CPUs=1 Weight=1 State=UNKNOWN
    NodeName=p2 NodeAddr=172.19.181.2 CPUs=1 Weight=1 State=UNKNOWN
    NodeName=p3 NodeAddr=172.19.181.3 CPUs=1 Weight=1 State=UNKNOWN
    NodeName=p4 NodeAddr=172.19.181.4 CPUs=1 Weight=1 State=UNKNOWN
    PartitionName=mycluster Nodes=p[1–4] Default=YES MaxTime=INFINITE State=UP
    
    sudo cp slurm.conf cgroup.conf cgroup_allowed_devices_file.conf /media/Storage
    

    Setting up Slurm Pi Zero clients

    Ssh to each of the 4 nodes and:

    sudo apt-get install -y slurmd slurm-client
    sudo cp /media/Storage/slurm.conf /etc/slurm-llnl/slurm.conf
    sudo cp /media/Storage/cgroup* /etc/slurm-llnl
    sudo systemctl enable slurmd
    sudo systemctl start slurmd
    

    Back on the controller run scontrol and type:

    update nodename=p[1,2,3,4] state=power_up
    update nodename=p[1,2,3,4] state=resume
    sinfo 
    srun --nodes=4 hostname
    

    You should now have all nodes running and controlled by slurm. This should also have printed out the hostnames of the four nodes.

    Running the cluster

    Power on the Pi 3 controller (if not already powered on> and log in.

    Power on the cluster: sudo clusterhat on

    The cluster can be powered off with the command: sudo clusterhat off.

    Get info in the cluster: sinfo

    Here we should get a return value which shows we have 4 nodes waiting for us. If node state is down, we can change this to idle:

    sudo scontrol update nodename=p[1,2,3,4] state=IDLE

    We can use srun to run our first program across the cluster: srun --nodes=4 hostname

    We should now get an output from each of the 4 nodes showing their host name.

    If my instructions made no sense, then try these two links:

  • The Official ClusterHat instructions: https://clusterhat.com/setup-overview
  • The Missing ClusterHat Tutorial (where I got most of my instructions from): https://medium.com/@dhuck/the-missing-clusterhat-tutorial-45ad2241d738