The idea of this post is to show how you can deploy a basic TensorFlow architecture to train a model, using AWS and the tool Infrastructure Manager. All the code and scripts are on GitHub.
Continue reading “Deploying Distributed TensorFlow with Infrastructure Manager and Ansible”
This is the third part of the tutorial to install and configure SLURM on Azure (part I, part II). With this post, we are going to complete the process and we show an example of the execution of one task.
Continue reading “SLURM Cluster Configuration on Azure (Part III)”
This is the second post of the SLURM configuration and installation guide on Azure (part I is here). In this part, we are going to configure the NFS system, and finally, in the third post, we are going to set up the SLURM environment.
Continue reading “SLURM Cluster Configuration on Azure (Part II)”
I got some free time to share this project, the deployment of a workload manager to ease the management of my research group’s cluster of GPUs.
Continue reading “SLURM Cluster Configuration on Azure (Part I)”