Recent News:
- Welcome to ProFactory!
- Features
- Supported Models
- Supported Training Approaches
- Supported Datasets
- Supported Metrics
- Reuirement
- Get Started
- Citation
- Acknowledgement
- Vaious protein langugae models: ESM2, ESM-b, ESM-1v, ProtBert, ProtT5, Ankh, etc
- Comprehensive supervised datasets: Localization, Fitness, Solubility, Stability, etc
- Easy and quick data collector: AlphaFold2 Database, RCSB, InterPro, Uniprot, etc
- Experiment moitors: Wandb, Local
- Friendly interface: Gradio UI
Model | Model size | Template |
---|---|---|
ESM2 | 8M/35M/150M/650M/3B/15B | facebook/esm2_t33_650M_UR50D |
ESM-1b | 650M | facebook/esm1b_t33_650M_UR50S |
ESM-1v | 650M | facebook/esm1v_t33_650M_UR90S_1 |
ProtBert-Uniref100 | 420M | Rostlab/prot_bert_bfd |
ProtBert-BFD100 | 420M | Rostlab/prot_bert_bfd |
ProtT5-Uniref50 | 3B/11B | Rostlab/prot_t5_xl_uniref50 |
ProtT5-BFD100 | 3B/11B | Rostlab/prot_t5_xl_bfd |
Ankh | 450M/1.2B | ElnaggarLab/ankh-base |
Approach | Full-tuning | Freeze-tuning | LoRA | SES-Adapter |
---|---|---|---|---|
Pre-Training | ❎ | ❎ | ❎ | ❎ |
Supervised Fine-Tuning | ❎ | ✅ | ✅ | ✅ |
Pre-training datasets
- CATH_V43_S40 | structures
Supervised fine-tuning datasets (amino acid sequences/ foldseek sequences/ ss8 sequences)
- DeepLocBinary_ESMFold | protein-wise | single_label_classification
- DeepLocBinary_AlphaFold2 | protein-wise | single_label_classification
- DeepLocMulti_ESMFold | protein-wise | single_label_classification
- DeepLocMulti_AlphaFold2 | protein-wise | single_label_classification
- DeepSol_ESMFold | protein-wise | single_label_classification
- DeepSoluE_ESMFold | protein-wise | single_label_classification
- ProtSolM_ESMFold | protein-wise | single_label_classification
- EC_ESMFold | protein-wise | multi_label_classification
- EC_AlphaFold2 | protein-wise | multi_label_classification
- GO_BP_ESMFold | protein-wise | multi_label_classification
- GO_BP_AlphaFold2 | protein-wise | multi_label_classification
- GO_CC_ESMFold | protein-wise | multi_label_classification
- GO_CC_AlphaFold2 | protein-wise | multi_label_classification
- GO_MF_ESMFold | protein-wise | multi_label_classification
- GO_MF_AlphaFold2 | protein-wise | multi_label_classification
- MetalIonBinding_ESMFold | protein-wise | single_label_classification
- MetalIonBinding_AlphaFold2 | protein-wise | single_label_classification
- Thermostability_ESMFold | protein-wise | regression
[!TIP] Only structural sequences are different for the same dataset, for example,
DeepLocBinary_ESMFold
andDeepLocBinary_AlphaFold2
share the same amino acid sequences, this means if you only want to use theaa_seqs
, both are ok!
Supervised fine-tuning datasets (amino acid sequences)
- FLIP_AAV | protein-site | regression
- FLIP_GB1 | protein-site | regression
Metric Name | Full Name | Problem Type |
---|---|---|
accuracy | Accuracy | single_label_classification/ multi_label_classification |
recall | Recall | single_label_classification/ multi_label_classification |
precision | Precision | single_label_classification/ multi_label_classification |
f1 | F1Score | single_label_classification/ multi_label_classification |
mcc | MatthewsCorrCoef | single_label_classification/ multi_label_classification |
auc | AUROC | single_label_classification/ multi_label_classification |
f1_max | F1ScoreMax | multi_label_classification |
spearman_corr | SpearmanCorrCoef | regression |
Please make sure you have installed Anaconda3 or Miniconda3.
We recommend a 24GB RTX 3090 or better, but it mainly depends on which PLM you choose.
Please cite our work if you have used our code or data.
Thanks the support of Liang's Lab.