Alphafold2

Alphafold2 (https://alphafold.ebi.ac.uk/) is available on the IGBMC cluster from the command line by loading the alphafold/2.01 module or in Jupyter using the Alphafold 2.0.1 kernel.

Alphafold requires GPU to run its prediction algorithm. Please make sure to follow our GPU allocation guide to create the proper allocation for Alphafold.

Alphafold databases are made available on every GPU node on NVMe storage in /mnt/alphafold

Usage

To run Alphafold2 from the command line :

We assume you have an amino acid sequences called my.fasta

Connect to the cluster login node through SSH :

 ssh <login>@hpc.igbmc.fr

Load the Alphafold2 module :

module load alphafold/2.0.1

Start an interactive session or run a batch job on the GPU node :

Interactive approach

Create an allocation for GPU ressources

salloc -p gpu --gres=gpu:a3g.20gb:1 --cpus-per-task=10 --mem=50G

Create a folder to store alphafold output on the allocated node

srun mkdir -p /tmp/$USER_alphafold

Databases

Alphafold requires several databases to feed the model and make predictions. These databases undergo an update process similiar to the update of the algorithmic part of the program. They are stored under /shared/genomes. Pick one of the database sets in that directory. You can define an environment variable which will come handy later as the databases path is used several times in the Alphafold invocation. Example :

ALPHAFOLD_DB=/shared/genomes/2023-04-28

Databases may also be copied to a temporary directory for faster access especially if that directory is located on an SSD. At the time of writing this documentation, there exists a copy on phantom-node33 that resides in /mnt/alphafold. Update your database location according to your needs

Run alphafold

srun run_alphafold.sh \
     --fasta_paths=my.fasta \
     --output_dir=/tmp/$USER_alphafold \
     --model_names='model_1','model_2','model_3','model_4','model_5' \
     --data_dir=/mnt/alphafold \
     --preset=casp14 \
     --uniref90_database_path=$ALPHAFOLD_DB/uniref90/uniref90.fasta \
     --mgnify_database_path=$ALPHAFOLD_DB/mgnify/mgy_clusters_2018_12.fa \
     --pdb70_database_path=$ALPHAFOLD_DB/pdb70/pdb70 \
     --template_mmcif_dir=$ALPHAFOLD_DB/pdb_mmcif/mmcif_files \
     --max_template_date=2020-05-14 \
     --obsolete_pdbs_path=$ALPHAFOLD_DB/pdb_mmcif/obsolete.dat \
     --bfd_database_path=$ALPHAFOLD_DB/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
     --uniclust30_database_path=$ALPHAFOLD_DB/uniclust30/uniclust30_2018_08/uniclust30_2018_08

Get your outputs back to your team space

srun mv /tmp/$USER_alphafold /shared/space2/my_project

Do not forget to relinquish the allocation

exit

Batch job approach :

Create a my_fold.sh script based on the following example :

#!/bin/bash

#SBATCH -p gpu
#SBATCH --gres=gpu:a3g.20gb:1
#SBATCH --cpus-per-task=10
#SBATCH --mem=50G

module load alphafold/2.0.1

mkdir -p /tmp/$USER_alphafold

srun run_alphafold.sh --fasta_paths=my.fasta \
	--output_dir=/tmp/$USER_alphafold \
	--model_names='model_1','model_2','model_3','model_4','model_5' \
	--data_dir=/mnt/alphafold --preset=casp14 \
	--uniref90_database_path=$ALPHAFOLD_DB/uniref90/uniref90.fasta \
	--mgnify_database_path=$ALPHAFOLD_DB/mgnify/mgy_clusters_2018_12.fa \
	--pdb70_database_path=$ALPHAFOLD_DB/pdb70/pdb70 \
	--template_mmcif_dir=$ALPHAFOLD_DB/pdb_mmcif/mmcif_files \
	--max_template_date=2020-05-14 \
	--obsolete_pdbs_path=$ALPHAFOLD_DB/pdb_mmcif/obsolete.dat \
	--bfd_database_path=$ALPHAFOLD_DB/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
	--uniclust30_database_path=$ALPHAFOLD_DB/uniclust30/uniclust30_2018_08/uniclust30_2018_08

mv /tmp/$USER_alphafold /shared/space2/my_project

Start your batch job

sbatch my_fold.sh

To use Alphafold2 from Jupyter :

Connect to https://jupyterhub.igbmc.fr
Choose the GPU profile and click on "Start server"
Use the sample notebook AlphaFold.ipynb with the Alphafold 2.0.1 kernel

AlphaFold.ipynb94.8KB