This is the third part of the tutorial to install and configure SLURM on Azure (part I, part II). With this post, we are going to complete the process and we show an example of the execution of one task.
The Simple Linux Utility for Resource Management (SLURM) is an open-source task manager that is used in several clusters around the world, for example, at “Mare Nostrum”. It provides three key components:
- Resource management: Constraints, limitations and information.
- Tasks monitoring.
- Queue management.
It has several plugins and it is very modular. We have chosen a setup that will include:
- Authentication among nodes using \texttt{munge}.
- Accounting and job completion monitoring usingMySQL and the interface that provides slurmdbd.
- The controller daemon (slurmctl) and the compute node daemon (slurmd) of SLURM.
We are going to start with \texttt{Munge}, that allows nodes to authenticate each other.
Munge installation
We just have to follow these commands to install Munge on each node (controller and compute nodes):
jjorge@XXX:~$ sudo apt-get install \ libmunge-dev libmunge2 munge
After installing it, we should create a new key that will be shared among nodes later. We might use different mechanisms, one of them is using the command dd, proceeding as follows on the controller node:
jjorge@controller:~$ sudo su root@controller:# dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
This will generate a key that will be shared among nodes. It is required the adjustment of key’s permissions:
root@controller:# chown munge:munge \/etc/munge/munge.key root@controller:# chmod 400 /etc/munge/munge.key
Finally, we set the daemon and start the service.
root@controller:# systemctl enable munge root@controller:# systemctl start munge
We can check the correctness of this process with the following:
root@controller:# munge -n | unmunge | grep STATUS STATUS: Success (0)
After doing this, we should copy the key on every compute node, compute0 and compute1 in this case:
root@controller:# scp /etc/munge/munge.key \ compute0:/etc/munge/munge.key root@controller:# scp /etc/munge/munge.key \ compute1:/etc/munge/munge.key
We have to change permissions accordingly on each node:
root@computeX:# chown munge:munge \ /etc/munge/munge.key root@computeX:# chmod 400 /etc/munge/munge.key
Database installation
The next step is to install and configure the MySQL server that will store the database for accounting in SLURM on the controller node. We need the following packages to do this:
jjorge@controller:~$ sudo apt-get install ruby ruby-dev \ python-dev libpam0g-dev libmariadb-client-lgpl-dev \ libmysqlclient-dev jjorge@controller:~$ sudo apt-get install mysql-server jjorge@controller:~$ sudo apt-get install libmysqld-dev \ mariadb-server
During these steps, a password could be requested, make sure you will remember this password since it is the password for the database administrator. When the installation is completed, we can access the MySQL server with the following command (if it is required a password, just press enter):
~$ sudo mysql -u root
Inside the manager, we should introduce the following script:
create database slurm_acct_db; create user 'slurm'@'localhost'; set password for 'slurm'@'localhost' = password('slurmdbpass'); grant usage on *.* to 'slurm'@'localhost'; grant all privileges on slurm_acct_db.* to 'slurm'@'localhost'; flush privileges; exit
This will create the database “slurm_acct_db”, the user “slurm” with the password “slurmdbpass”. With the “grant” commands we provide privileges to this user over this database.
After configuring MySQL and MySQL, from this point onwards, we are going to work on the controller node to configure the SLURM controller, and eventually the SLURM daemon on the rest of the nodes in the cluster.
Controller node
Now we are prepared to install SLURM. Considering that we are using the distribution “Ubuntu 16.04 Server” and the packages that are required for SLURM are in its repository, we just have to get these packages using apt-get:
jjorge@controller:~$ sudo apt-get install slurm \ slurm-client slurm-wlm slurm-wlm-basic-plugins \ slurmctld slurmd slurmdbd
We can find more information about SLURM on this paths:
- /usr/share/doc/slurmctld
- /usr/share/doc/slurmd
- /usr/share/doc/slurmdbd
SLURM requires some configuration files that are not included with the installation process. The first file is the slurm.conf, that configures the nodes’ names, the authentication parameters and so on. One example that could be adapted is provided in the Appendix. The part that deserves more attention in this case is the section related with the definition of nodes and partitions:
# COMPUTE NODES NodeName=compute[0-1] CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN PartitionName=debug Nodes=compute[0-1] Default=YES MaxTime=INFINITE State=UP
These lines are defining the information about the compute nodes that we have as well as the information related with partitions. For this example, we have created one partition called debug with these two compute nodes.
There is another file that is required, the configuration file for the database’s settings. In this case, the file is called slurmdbd.conf, that is also provided as an Appendix.
Both files should be stored in /etc/slurm-llnl/slurm.conf and /etc/slurm-llnl/slurmdbd.conf.
With these last steps we have configured all the files required on the controller node to use SLURM. Now, we have to register the new cluster in the database. First, we make sure that MySQL and slurmdbd are running:
jjorge@controller:~$ sudo /etc/init.d/mysql start jjorge@controller:~$ sudo /etc/init.d/slurmdbd start
Then, we can access to the database to create the required entities:
jjorge@controller:~$ mysql -u slurm -p slurm_acct_db
Regarding these entities, we are going to create the cluster, the account or role, and a new user within these groups that is also an administrator.
jjorge@controller:~$ sudo sacctmgr add cluster slurmcluster jjorge@controller:~$ sudo sacctmgr -i add account \ researcher Description="Researcher"\ Organization="ResearchGroup" jjorge@controller:~$ sudo sacctmgr -i create user jjorge account=researcher adminlevel=Administrator \ partition=debug
We can show the user’s information by means of the following command:
jjorge@controller:~$ sudo sacctmgr show user name=jjorge User Def Acct Admin ---------- ---------- --------- jjorge researcher Administ+
We can check the status of the slurmctl daemon with:
jjorge@controller:~$ sudo /etc/init.d/slurmctld status slurmctld.service - Slurm controller daemon Loaded: loaded (/lib/systemd/system/slurmctld.service; enabled) Active: active (running) since Tue 2018-01-02 15:36:09 UTC; 6 days ago Main PID: 1386 (slurmctld)Tasks: 9 Memory: 9.2M CPU: 2min 9.986s CGroup: /system.slice/slurmctld.service1386 /usr/sbin/slurmctld
Starting it in the case it is not started.
jjorge@controller:~$ sudo /etc/init.d/slurmctl start
Compute Node
As we have already installed munge on every node, and we have shared the key among nodes, we just have to install the slurmd daemon for the compute nodes.
jjorge@compute:~$ sudo apt-get install slurmd
Copying the configuration file from the controller:
root@controller:/home/jjorge# scp \ /etc/slurm-llnl/slurm.conf \ compute0:/etc/slurm-llnl/slurm.conf root@controller:/home/jjorge# scp \ /etc/slurm-llnl/slurm.conf \ compute1:/etc/slurm-llnl/slurm.conf
And then, restarting the services and the nodes.
jjorge@computeX:~$ sudo systemctl enable slurmd jjorge@computeX:~$ sudo systemctl start slurmd jjorge@XXXX:~$ sudo reboot
After rebooting, detecting nodes could take time. Regarding this, to force the discovery of the nodes, we could use this command on the controller:
jjorge@controller:~$ scontrol update NodeName=compute0 State=resume jjorge@controller:~$ scontrol update NodeName=compute1 State=resume
After this, we should see both compute nodes as idle.
jjorge@controller:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST debug* up infinite 2 idle compute[0-1]
As a simple task, we could run the hostname command on both nodes:
jjorge@controller:~$ srun -N 2 hostname compute0 compute1
Example task
In this section, we will run a task using the command sbatch. The script that will be launched is the following:
#!/bin/bash #SBATCH --partition=debug #SBATCH --ntasks=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --job-name="Medium" cd /nfs/jjorge/stream ./memtransf > out_"$HOSTNAME".txt
Where memtransf is a script that measures memory transfer rates in MB/s for simple computational kernels coded in C, adapted from here. This is just an example of a more complex task that takes longer than simple commands.
The launching command is the following
jjorge@controller:/nfs/jjorge/stream$ sbatch medium.job Submitted batch job 58 jjorge@controller:/nfs/jjorge/stream$ sbatch medium.job Submitted batch job 59
The result of launching this script could be controlled, for example, with sacct:
jjorge@controller:/nfs/jjorge/stream$ sacct JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 57 hostname debug researcher 2 COMPLETED 0:0 58 Medium debug researcher 1 RUNNING 0:0 59 Medium debug researcher 1 RUNNING 0:0
The command squeue also provides related information:
jjorge@controller:/nfs/jjorge/stream$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 58 debug Medium jjorge R 0:02 1 compute0 59 debug Medium jjorge R 0:01 1 compute1
Again, sinfo provides information of the status of the nodes with some flags as it is shown underneath:
jjorge@controller:/nfs/jjorge/stream$ sinfo -Nel Tue Jan 9 10:22:50 2018 NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK .. compute0 1 debug* allocated 1 1:1:1 1 ...compute1 1 debug* allocated 1 1:1:1 1 ...
When tasks are finished, we could access the results on all nodes thanks to the NFS, for example, from the controller:
jjorge@controller:/nfs/jjorge/stream$ ls -l total 92 -rw-rw-r-- 1 jjorge jjorge 198 Jan 9 10:18 medium.job -rwxrwxr-x 1 jjorge jjorge 17912 Jan 9 10:18 memtransf -rw-rw-r-- 1 jjorge jjorge 1764 Jan 9 10:23 out_comp0.txt -rw-rw-r-- 1 jjorge jjorge 1764 Jan 9 10:23 out_comp1.txt-rw-rw-r-- 1 jjorge jjorge 0 Jan 9 10:14 slurm-58.out -rw-rw-r-- 1 jjorge jjorge 0 Jan 9 10:14 slurm-59.out
Conclusions
In this work we have deployed a cluster on Azure, the Cloud Platform that Microsoft provides. This cluster offers a network file system to share directories among the nodes, and the job and resources management through the state-of-the-art tool called SLURM, fully functional and extendable.
Regarding Azure’s functionality, we have explored the deployment of instances, the understanding of the networking system underneath, the template deployment, the powerful feature of creating images to deploy and contextualize VM and a brief interaction with the different scripting options: Azure CLI and PowerShell. Sadly, due to the limitations that a free account has, we just used this scripting tools in a limited manner.
Appendix
The slurm.conf file.
# slurm.conf file generated by configurator.html. # Put this file on all nodes of your cluster. # See the slurm.conf man page for more information. #ControlMachine=controller #ControlAddr= #BackupController= #BackupAddr= #AuthType=auth/munge CacheGroups=0 #CheckpointType=checkpoint/noneCrypto Type=crypto/munge #DisableRootJobs=NO #EnforcePartLimits=NO #Epilog= #EpilogSlurmctld= #FirstJobId=1 #MaxJobId=999999 #GresTypes= #GroupUpdateForce=0 #GroupUpdateTime=600 #JobCheckpointDir=/var/lib/slurm-llnl/checkpoint #JobCredentialPrivateKey= #JobCredentialPublicCertificate= #JobFileAppend=0 #JobRequeue=1 #JobSubmitPlugins=1 #KillOnBadExit=0 #LaunchType=launch/slurm #Licenses=foo*4,bar #MailProg=/usr/bin/mail #MaxJobCount=5000 #MaxStepCount=40000 #MaxTasksPerNode=128 MpiDefault=none #MpiParams= ports= #PluginDir= #PlugStackConfig= #PrivateData=jobs ProctrackType=proctrack/pgid #Prolog= #PrologFlags= #PrologSlurmctld= #PropagatePrioProcess=0 #PropagateResourceLimits= #PropagateResourceLimitsExcept= #RebootProgram= ReturnToService=1 #SallocDefaultCommand=Slurmctld PidFile=/var/run/slurm-llnl/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd SlurmUser=slurm #SlurmdUser=root #SrunEpilog= #SrunProlog= StateSaveLocation=/var/lib/slurm-llnl/slurmctld SwitchType=switch/none #TaskEpilog= TaskPlugin=task/none #TaskPluginParam= #TaskProlog= #TopologyPlugin=topology/tree #TmpFS=/tmp #TrackWCKey=no #TreeWidth= #UnkillableStepProgram= #UsePAM=0 # # TIMERS #BatchStartTimeout=10 #CompleteWait=0 #EpilogMsgTime=2000 #GetEnvTimeout=2 #HealthCheckInterval=0 #HealthCheckProgram= InactiveLimit=0 KillWait=30 #MessageTimeout=10 #ResvOverRun=0 MinJobAge=300 #OverTimeLimit=0 SlurmctldTimeout=120 SlurmdTimeout=300 #UnkillableStepTimeout=60 #VSizeFactor=0 Waittime=0 # # SCHEDULING #DefMemPerCPU=0FastSchedule=1 #MaxMemPerCPU=0 #SchedulerRootFilter=1 #SchedulerTimeSlice=30 SchedulerType=sched/backfillSchedulerPort=7321 SelectType=select/linear #SelectTypeParameters= # # JOB PRIORITY #PriorityFlags= #PriorityType=priority/basic #PriorityDecayHalfLife= #PriorityCalcPeriod= #PriorityFavorSmall= #PriorityMaxAge= #PriorityUsageResetPeriod= #PriorityWeightAge= #PriorityWeightFairshare= #PriorityWeightJobSize= #PriorityWeightPartition= #PriorityWeightQOS= # # LOGGING AND ACCOUNTING #AccountingStorageEnforce=0 AccountingStorageHost=localhost AccountingStorageLoc=slurm_acct_db AccountingStoragePass=/var/run/munge/munge.socket.2 #AccountingStoragePort= AccountingStorageType=accounting_storage/slurmdbdAccounting StorageUser=slurm AccountingStoreJobComment=YES ClusterName=cluster #DebugFlags= JobCompHost=localhost JobCompLoc=slurm_acct_db JobCompPass=slurmdbpass #JobCompPort= JobCompType=jobcomp/slurmdbdJobCompUser=slurm #JobContainerPlugin=job_container/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=3 SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log SlurmdDebug=3SlurmdLogFile=/var/log/slurm-llnl/slurmd.log #SlurmSchedLogFile= #SlurmSchedLogLevel= # # POWER SAVE SUPPORT FOR IDLE NODES (optional) #SuspendProgram= #ResumeProgram= #SuspendTimeout= #ResumeTimeout= #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # COMPUTE NODES NodeName=compute[0-1] CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN PartitionName=debug Nodes=compute[0-1] Default=YES MaxTime=INFINITE State=UP
And the slurmdb.conf:
# Example slurmdbd.conf file. # See the slurmdbd.conf man page for more information. # #Archive info #ArchiveJobs=yes #ArchiveDir="/tmp" #ArchiveSteps=yes #ArchiveScript= #JobPurge=12 #StepPurge=1 # #Authentication infoAuthType=auth/munge #AuthInfo=/var/run/munge/munge.socket.2 # # slurmDBD info DbdAddr=localhost DbdHost=localhost #DbdPort=7031 SlurmUser=slurm #MessageTimeout=300 DebugLevel=4 #DefaultQOS=normal,standby LogFile=/var/log/slurm-llnl/slurmdbd.log PidFile=/var/run/slurm-llnl/slurmdbd.pid #PluginDir=/usr/lib/slurm #PrivateData=accounts,users,usage,jobs #TrackWCKey=yes # # Database info StorageType=accounting_storage/mysql StorageHost=localhost #StoragePort=1234 StoragePass=slurmdbpass StorageUser=slurm StorageLoc=slurm_acct_db