SLURM user guide 🇬🇧

SLURM is the queue manager used on the NNCR HPC cluster. You must use SLURM to submit jobs to the cluster.

All the commands presented in this guide are to be used from the host hpc.igbmc.fr

SLURM partitions and nodes

The IGBMC HPC cluster is organized into several SLURM partitions. Each partition gathers a set of compute nodes that have similar usage.

The default partition used by SLURM (igbmc) is suitable for most jobs.

To view all partitions available on the cluster run :

sinfo

Please note that you may not have the rights to use all partitions

To view all available nodes run :

sinfo -Nl

Submitting a job to the cluster

Submitting a batch job

A batch job is a job running in an automated way on the cluster and that doesn't require user interaction.

Usage

The job starts when resources are available. The command only returns the job id. The outputs are sent to file(s). This works ONLY with shell scripts. The batch script may be given to sbatch through a file name on the command line, or if no file name is specified, sbatch will read in a script from standard input.

Batch scripts rules

The script can contain srun commands. Each srun is a job step. The script must start with shebang (#!) followed by the path of the interpreter

#!/bin/bash

#!/usr/bin/env python

The execution parameters can be set:

At runtime in the command sbatch

sbatch --mem=40GB bowtie2.sbatch

Or within the shell itself

#!/bin/bash
#
#SBATCH --mem 40GB
srun bowtie2 -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam

bowtie2.sbath

sbatch bowtie2.sbatch

The scripts can contain slurm options just after the shebang but before the script commands → #SBATCH

⚠️

Note that the syntax #SBATCH is important and doesn't contain any ! (as in the Shebang)

💡

We recommend to set as many parameters as you can in the script to keep a track of your execution parameters for a future submission.

Execution parameters

These parameters are common to the commands srun and sbatch.

Parameters for log

#!/bin/bash
#
#SBATCH -o slurm.%N.%j.out  # STDOUT file with the Node name and the Job ID
#SBATCH -e slurm.%N.%j.err  # STDERR file with the Node name and the Job ID

Parameters to control the job

-partition=<partition_names>, p

Request a specific partition for the resource allocation. Each partition (queue in SGE) have their own limits: time, memory, nodes ...

Cf: sinfo to know which partitions are available.

-mem=<size[units]>

Specify the real memory required per node. The default units is MB (Default: 2GB)

The job is killed if it exceeds the limit

Note that you can use the variable $SLURM_MEM_PER_NODE in the command line to synchronize the software settings and the resource allocated.

-time=<time>, t

Set a limit on the total run time of the job allocation.

Acceptable time formats include "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

Parameters for multithreading

-cpus-per-task=<ncpus>, -cpus, c

Request a number of CPUs (default 1)

Note that you can use the variable $SLURM_CPUS_PER_TASK in the command line to avoid mistake between the resource allocated and the job.

#!/bin/bash
#
#SBATCH --cpus-per-task=8

srun bowtie2 --threads $SLURM_CPUS_PER_TASK -x hg19 -1 sample_R1.fq.gz -2 sample_R2.fq.gz -S sample_hg19.sam

To learn more about the sbatch command, see the official documentation

Full example of the `sbatch` command

Random

Open a script file with any text editor (but not Word)

For beginners, we suggest to use nano, which has restricted functionalities but is quite intuitive.

nano slurm_random.sh

Copy/Paste the following script which is writing 10 000 random numbers in a file and then sort them :

#!/bin/bash
#
#SBATCH -p fast                      # partition
#SBATCH -N 1                         # nombre de nœuds
#SBATCH -n 1                         # nombre de cœurs
#SBATCH --mem 100                    # mémoire vive pour l'ensemble des cœurs
#SBATCH -t 0-2:00                    # durée maximum du travail (D-HH:MM)
#SBATCH -o slurm.%N.%j.out           # STDOUT
#SBATCH -e slurm.%N.%j.err           # STDERR

for i in {1..10000}; do
  echo $RANDOM >> SomeRandomNumbers.txt
done

sort -n SomeRandomNumbers.txt > SomeRandomNumbers_sorted.txt

Press Ctrl-x to exit nano, then Y when nano asks you whether the modified buffer should be saved, then press the Enter key to confirm the file name.

Check the content of the script

cat slurm_random.sh

Submit the job

sbatch slurm_random.sh

Check the result

Since this script is running a very basic task, the results should promptly be available.

Check the output files with ls and head.

☝

These commands can be run on the login node since they are consuming very little computing resources.

# List the result files
ls -l SomeRandomNumbers*.txt

# Print the 20 first lines of the original random numbers
head -n 20 SomeRandomNumbers.txt

# Print the 20 first lines of the sorted random numbers
head -n 20 SomeRandomNumbers_sorted.txt

# Print the 20 last lines of the sorted random numbers
tail -n 20 SomeRandomNumbers_sorted.txt

Salmon

Open a script file with any text editor (but not Word)

nano slurm_salmon.sh

Set the slurm parameters, the [conda] environment and the command itself

#!/bin/bash
#
#SBATCH -o slurm.%N.%j.out
#SBATCH -e slurm.%N.%j.err
#SBATCH --mail-type END
#SBATCH --mail-user foo.bar@france-bioinformatique.fr
#
#SBATCH --partition fast
#SBATCH --cpus-per-task 6
#SBATCH --mem 5GB

module load salmon

salmon quant --threads $SLURM_CPUS_PER_TASK -i transcripts_index -l A -1 reads1.fq -2 reads2.fq -o transcripts_quant

Submit the job

sbatch slurm_salmon.sh

Interactive job

To submit an interactive job, you will need two SLURM command :

salloc to request a ressources allocation
srun to run interactive job steps on an allocation

The salloc command lets you request a ressources allocation and start an interactive session :

salloc takes the same parameters than the sbatch command.

salloc --mem-per-cpu=2G --cpus-per-task=10

The salloc command returns as soon as the requested ressources have been allocated.

You can then run srun command to start interactive job steps over this allocation :

Example:

srun hostname

The outputs are returned to the terminal.

If you need to interact with the command using your keyboard, add the --pty option to srun :

Example:

module load alphafold/2.0.1
srun --pty python

To relinquish the allocation, use the exit command

exit

To learn more about the salloc command, see the official documentation

Job information

List a user's current jobs:

squeue -u <username>

List a user's running jobs:

squeue -u <username> -t RUNNING

List a user's pending jobs:

squeue -u <username> -t PENDING

View accounting information for all user's job for the current day :

sacct --format=JobID,JobName,User,Submit,ReqCPUS,ReqMem,Start,NodeList,State,CPUTime,MaxVMSize%15 -u <username>

View accounting information for all user's job for the 2 last days (it worth an alias) :

sacct -a -S $(date --date='2 days ago' +%Y-%m-%dT%H:%M) --format=JobID,JobName,User%15,Partition,ReqCPUS,ReqMem,State,Start,End,CPUTime,MaxVMSize -u <username>

List detailed job information:

scontrol show -dd jobid=<jobid>

Manage jobs

To cancel/stop a job:

scancel <jobid>

To cancel all jobs for a user:

scancel -u <username>

To cancel all pending jobs for a user:

scancel -t PENDING -u <username>

SLURM partitions and nodes
Submitting a job to the cluster
Submitting a batch job
Usage
Batch scripts rules
Execution parameters
Full example of the sbatch command
Interactive job
Job information
Manage jobs