SLURM Cluster Configuration on Azure (Part II)

This is the second post of the SLURM configuration and installation guide on Azure (part I is here). In this part, we are going to configure the NFS system, and finally, in the third post, we are going to set up the SLURM environment.

NFS: Shared Directories

Considering that computers have to share some files and directories, we have decided to configure a node in the cluster as a Network Attached Storage (NAS). For doing this, we have configured the node that we named \texttt{nasnode} to store the information that could be required for other nodes using the Network File System (NFS) protocol that Linux provides.

Being connected to nasnode, we introduce the following commands to install the NFS server:

jjorge@nasnode:~$ sudo apt-get update
jjorge@nasnode:~$ sudo apt-get install \
   rpcbind nfs-kernel-server

We should edit /etc/fstab and /etc/exports to include the following lines, to mount this folder and make it available to the nodes.

 jjorge@nasnode:~$ sudo vi /etc/fstab
 # ...Rest of the file
 # Adding this line at bottom
 /home /nfs none bind 0 0
 # ...Rest of the file
 jjorge@nasnode:~$ sudo vi /etc/exports
 # ...Rest of the file
 # Adding this line at bottom
 /nfs 10.0.0.8/24(fsid=0,rw,sync,no_subtree_check,no_root_squash)
 # ...Rest of the file

Then, we can create the directory that will store the shared files and mount the partition. We can use a different name but keeping it coherent among nodes.

 jjorge@nasnode:~$ sudo mkdir /nfs
 jjorge@nasnode:~$ sudo mount /nfs
 jjorge@nasnode:~$ sudo /etc/init.d/nfs-kernel-server \
    restart

We have finished with nasnode, now we are going to configure the rest of the cluster to have access to this directory. The following steps have to be done on each node. For example, we are going to configure the compute0 node.

 jjorge@compute0:~$ sudo apt-get update
 jjorge@compute0:~$ sudo apt-get install nfs-common

We should modify the local fstab as well, and then create the directory and mount the volume:

jjorge@compute0:~$ sudo vi /etc/fstab
 # ...Rest of the file
 # Adding this line at bottom
 nas:/nfs /nfs nfs auto,rsize=8192,wsize=8192 0 0
 # ...Rest of the file
 jjorge@compute0:~$ sudo mkdir /nfs
 jjorge@compute0:~$ sudo mount /nfs/

Now, logged in as the main user, we can use his folder:

jjorge@compute0:/nfs/jjorge$ cd
 jjorge@compute0:~$ cd /nfs/jjorge/
 jjorge@compute0:/nfs/jjorge$ cat > example.txt
 hello world!
 jjorge@compute0:/nfs/jjorge$ cat example.txt
 hello world!
 jjorge@compute0:/nfs/jjorge$ ls
 example.txt

And we can get this file from every node in the cluster.

jjorge@controller:~$ cd /nfs/jjorge/
 jjorge@controller:/nfs/jjorge$ ls
 example.txt
 jjorge@controller:/nfs/jjorge$ cat example.txt
 hello world!

Now, having the shared directory set, we will install SLURM in the following post.