Introduction
Since 2018, IGBMC proposes GPU ressources on its cluster.
If you need GPU processor for your analysis, you will need to contact the IT helpdesk to request access to the GPU ressources.
Hardware
The IGBMC cluster includes the following GPU ressources :
Processor model | Nodes | Processor generation | GPU model | RAM per GPU (in Gb) | GPU(s) per node | Total GPU available | CPUs per node | RAM per node (in GB) |
---|---|---|---|---|---|---|---|---|
AMD EPYC 7413 | 2 | Epyc | NVidia A100 (instance of 20Gb) | 20 | 4 | 8 | 96 | 528 |
Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz | 1 | Broadwell | NVidia Tesla K80 | 12 | 4 | 4 | 32 | 128 |
Storage
Each A100 node includes a local NVMe scratch storage of 4.1TB accessible from /tmp
Slurm access
GPU partition
To submit a job to the GPU nodes you will need to specify a the GPU partition :
#SBATCH -p gpu
Number of GPUs
You will also need to use the --gres=gpu:N
parrameter where N
will have to be replaced by the number of GPUs you need.
#SBATCH --gres=gpu:N
The allocated GPU is communicated to the program via the CUDA_VISIBLE_DEVICES
environment variable which is a mechanism usually supported by CUDA porgrams. Should a program try to run on another GPU as the one allocated, or if there was no allocation request, the CUDA program will stop prematurely, potentially without reporting any error.
Jupyter
When using the cluster through JupyterHub, make sure to choose the GPU profile in the list to have access to a GPU processor.
In order to be able to run a JuyterLab instance on GPU ressource you need to make sure that your SLURM default account has access to the GPU partition.
To list your SLURM account that have access to the GPU partition, use the following command:
sacctmgr show assoc partitions=gpu users=$USER format=account