Hardware Setup
The Hardware set up is very simple.
The kit comes with:
First (making sure no power is connected to anything) attach the four standoffs with four of the screws to the Raspberry Pi 3.
Second plug the USB cable in and thread it over the Pi the top of the pi coming out at the opposite end.
Third Place the cluster hat on top of the Pi 3 and make sure it is firmly seated on the GPIO headers.
Fourth user the four remaining screws to screw the cluster hat to the standoffs.
Fifth connect the threaded usb cable to to the cluster hat.
Sixth plug in your four Pi Zeros firmly to the cluster hat.
Notes on power supply: I initially used a 2.5A supply but this was soon apparent to be lacking in power for this set up, so upgraded to a 5A supply. I am not sure that going beyond 5A would be safe or advisable.
As a final step, inset the SD cards with the downloaded images. I chose the CNAT versions for ease of set up. In addition I connected a 1TB USB disk to the Pi 3 controller to be a shared file system for the cluster.
You can get the images from here: ClusterCtrl Setup Software
Controller Setup
Next we will set up the software on the Pi 3 controller.
Download the Clusterhat imaged from ClusterCtrl Setup Software page. Chooses the CNAT version for ease of set up.
Boot up Pi, set password and localisation.
Set the controller pi hostname to pi0 to be inline with the naming schema we are going to use
sudo hostname pi0 sudo vi /etc/hostname # change the hostname in this file sudo vi /etc/hosts # change ‘controller’ to pi0
Generate SSH key:
ssh-keygen -t rsa -b 4096 cat ~/.ssh/id_rsa.pub
Set up ssh names in ~/.ssh/config
Host p1
Host p2
Host p3
Host p4
Next we set up a usb disk for shared data.
Use lsblk to show device info
sudo mkfs.ext4 /dev/sda1 (or sd** as found from lsblk sudo mkdir /media/Storage sudo chown nobody.nogroup -R /media/Storage sudo chmod -R 777 /media/Storage
Use blkid to find the UUID for fstab
sudo vi /etc/fstab
Add the following line to the bottom of the fstab file:
UUID=a13c2fad-7d3d-44ca-b704-ebdc0369260e /media/Storage ext4 defaults 0 2
Install NFS server on p0:
sudo apt-get install -y nfs-kernel-server
Update the exports file:
/media/Storage 172.19.181.0/24(rw,sync,no_root_squash,no_subtree_check) sudo mount /media/Storage sudo exportfs -a
Setting up the Pi Zero Nodes
Power on clusterhat with the command sudo clusterhat on - sudo clusterhat off switches it off again.
Due to ssh not being enabled by default we must connect via serial to enable. The default credentials are: user: pi :pass: clusterctrl.
Minicom -D /dev/ttypi1 cd /boot/ sudo touch ssh sudo apt-get install -y ntpdate (on p0 copy the pub key from ~/.ssh/id_rsa.pub) echo [paste the ssh key here] >> ~/.ssh/authorized_keys
Reset the node password to what ever you want
Repeat above for all 4 nodes and then we will be able to use ssh.
Finally we set up nfs client on the 4 nodes
ssh to each of the nodes in turn (p1, p2, p3, p4).
sudo apt-get install -y nfs-common sudo mkdir /media/Storage sudo chown nobody.nogroup /media/Storage sudo chomod -R 777 /media.Storage sudo vi /etc/fstab 172.19.181.254:/media/Storage /media/Storage nfs defaults 0 0 Check /etc/hosts.deny for RPC sudo mount -a
Setting up Slurm on the Controller
Slurm is an HPC job control system, this what will farm the job out to the nodes and control the job queues.
On the controller Pi 3 (hostname p0)
sudo apt-get install -y slurm-wlm cd /etc/slurm-llnl sudo Cp /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz . sudo Gzip -d slurm.conf.simple.gz sudo Mv slurm.conf.simple slurm.conf sudo vi /etc/slurm-llnl/slurm.conf SlurmctldHost=p0(172.19.181.254) SelectType=select/cons_res SelectTypeParameters=CR_Core ClusterName=cluster NodeName=p0 NodeAddr=172.19.181.254 CPUs=2 Weight=2 State=UNKNOWN NodeName=p1 NodeAddr=172.19.181.1 CPUs=1 Weight=1 State=UNKNOWN NodeName=p2 NodeAddr=172.19.181.2 CPUs=1 Weight=1 State=UNKNOWN NodeName=p3 NodeAddr=172.19.181.3 CPUs=1 Weight=1 State=UNKNOWN NodeName=p4 NodeAddr=172.19.181.4 CPUs=1 Weight=1 State=UNKNOWN PartitionName=mycluster Nodes=p[1–4] Default=YES MaxTime=INFINITE State=UP sudo cp slurm.conf cgroup.conf cgroup_allowed_devices_file.conf /media/Storage
Setting up Slurm Pi Zero clients
Ssh to each of the 4 nodes and:
sudo apt-get install -y slurmd slurm-client sudo cp /media/Storage/slurm.conf /etc/slurm-llnl/slurm.conf sudo cp /media/Storage/cgroup* /etc/slurm-llnl sudo systemctl enable slurmd sudo systemctl start slurmd
Back on the controller run scontrol and type:
update nodename=p[1,2,3,4] state=power_up update nodename=p[1,2,3,4] state=resume sinfo srun --nodes=4 hostname
You should now have all nodes running and controlled by slurm. This should also have printed out the hostnames of the four nodes.
Running the cluster
Power on the Pi 3 controller (if not already powered on> and log in.
Power on the cluster: sudo clusterhat on
The cluster can be powered off with the command: sudo clusterhat off.
Get info in the cluster: sinfo
Here we should get a return value which shows we have 4 nodes waiting for us. If node state is down, we can change this to idle:
sudo scontrol update nodename=p[1,2,3,4] state=IDLE
We can use srun to run our first program across the cluster: srun --nodes=4 hostname
We should now get an output from each of the 4 nodes showing their host name.
If my instructions made no sense, then try these two links: