Alphafold2 (https://alphafold.ebi.ac.uk/) is available on the IGBMC cluster from the command line by loading the alphafold/2.01
module or in Jupyter using the Alphafold 2.0.1
kernel.
Alphafold requires GPU to run its prediction algorithm. Please make sure to follow our GPU allocation guide to create the proper allocation for Alphafold.
Alphafold databases are made available on every GPU node on NVMe storage in /mnt/alphafold
Usage
To run Alphafold2 from the command line :
We assume you have an amino acid sequences called my.fasta
- Connect to the cluster login node through SSH :
- Load the Alphafold2 module :
- Start an interactive session or run a batch job on the GPU node :
ssh <login>@hpc.igbmc.fr
module load alphafold/2.0.1
Interactive approach
Create an allocation for GPU ressources
salloc -p gpu --gres=gpu:a3g.20gb:1 --cpus-per-task=10 --mem=50G
Create a folder to store alphafold output on the allocated node
srun mkdir -p /tmp/$USER_alphafold
Databases
Alphafold requires several databases to feed the model and make predictions. These databases undergo an update process similiar to the update of the algorithmic part of the program. They are stored under /shared/genomes. Pick one of the database sets in that directory. You can define an environment variable which will come handy later as the databases path is used several times in the Alphafold invocation. Example :
ALPHAFOLD_DB=/shared/genomes/2023-04-28
Databases may also be copied to a temporary directory for faster access especially if that directory is located on an SSD. At the time of writing this documentation, there exists a copy on phantom-node33 that resides in /mnt/alphafold. Update your database location according to your needs
Run alphafold
srun run_alphafold.sh \
--fasta_paths=my.fasta \
--output_dir=/tmp/$USER_alphafold \
--model_names='model_1','model_2','model_3','model_4','model_5' \
--data_dir=/mnt/alphafold \
--preset=casp14 \
--uniref90_database_path=$ALPHAFOLD_DB/uniref90/uniref90.fasta \
--mgnify_database_path=$ALPHAFOLD_DB/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path=$ALPHAFOLD_DB/pdb70/pdb70 \
--template_mmcif_dir=$ALPHAFOLD_DB/pdb_mmcif/mmcif_files \
--max_template_date=2020-05-14 \
--obsolete_pdbs_path=$ALPHAFOLD_DB/pdb_mmcif/obsolete.dat \
--bfd_database_path=$ALPHAFOLD_DB/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path=$ALPHAFOLD_DB/uniclust30/uniclust30_2018_08/uniclust30_2018_08
Get your outputs back to your team space
srun mv /tmp/$USER_alphafold /shared/space2/my_project
Do not forget to relinquish the allocation
exit
Batch job approach :
Create a my_fold.sh script based on the following example :
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres=gpu:a3g.20gb:1
#SBATCH --cpus-per-task=10
#SBATCH --mem=50G
module load alphafold/2.0.1
mkdir -p /tmp/$USER_alphafold
srun run_alphafold.sh --fasta_paths=my.fasta \
--output_dir=/tmp/$USER_alphafold \
--model_names='model_1','model_2','model_3','model_4','model_5' \
--data_dir=/mnt/alphafold --preset=casp14 \
--uniref90_database_path=$ALPHAFOLD_DB/uniref90/uniref90.fasta \
--mgnify_database_path=$ALPHAFOLD_DB/mgnify/mgy_clusters_2018_12.fa \
--pdb70_database_path=$ALPHAFOLD_DB/pdb70/pdb70 \
--template_mmcif_dir=$ALPHAFOLD_DB/pdb_mmcif/mmcif_files \
--max_template_date=2020-05-14 \
--obsolete_pdbs_path=$ALPHAFOLD_DB/pdb_mmcif/obsolete.dat \
--bfd_database_path=$ALPHAFOLD_DB/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--uniclust30_database_path=$ALPHAFOLD_DB/uniclust30/uniclust30_2018_08/uniclust30_2018_08
mv /tmp/$USER_alphafold /shared/space2/my_project
Start your batch job
sbatch my_fold.sh
To use Alphafold2 from Jupyter :
- Connect to https://jupyterhub.igbmc.fr
- Choose the
GPU
profile and click on "Start server" - Use the sample notebook AlphaFold.ipynb with the Alphafold 2.0.1 kernel