GitHub - code4luck/ProFactory_ft: Easy data acquisition and PLM fine-tuning.

✏️ Table of Contents

Features
Supported Models
Supported Training Approaches
Supported Datasets
Supported Metrics
Reuirement
Get Started
Citation
Acknowledgement

📑 Features

Vaious protein langugae models: ESM2, ESM-b, ESM-1v, ProtBert, ProtT5, Ankh, etc
Comprehensive supervised datasets: Localization, Fitness, Solubility, Stability, etc
Easy and quick data collector: AlphaFold2 Database, RCSB, InterPro, Uniprot, etc
Experiment moitors: Wandb, Local
Friendly interface: Gradio UI

🤖 Supported Models

Model	Model size	Template
ESM2	8M/35M/150M/650M/3B/15B	facebook/esm2_t33_650M_UR50D
ESM-1b	650M	facebook/esm1b_t33_650M_UR50S
ESM-1v	650M	facebook/esm1v_t33_650M_UR90S_1
ProtBert-Uniref100	420M	Rostlab/prot_bert_bfd
ProtBert-BFD100	420M	Rostlab/prot_bert_bfd
ProtT5-Uniref50	3B/11B	Rostlab/prot_t5_xl_uniref50
ProtT5-BFD100	3B/11B	Rostlab/prot_t5_xl_bfd
Ankh	450M/1.2B	ElnaggarLab/ankh-base

🔬 Supported Training Approaches

Approach	Full-tuning	Freeze-tuning	LoRA	SES-Adapter
Pre-Training	❎	❎	❎	❎
Supervised Fine-Tuning	❎	✅	✅	✅

📚 Supported Datasets

Pre-training datasets

CATH_V43_S40 | structures

Supervised fine-tuning datasets (amino acid sequences/ foldseek sequences/ ss8 sequences)

DeepLocBinary_ESMFold | protein-wise | single_label_classification
DeepLocBinary_AlphaFold2 | protein-wise | single_label_classification
DeepLocMulti_ESMFold | protein-wise | single_label_classification
DeepLocMulti_AlphaFold2 | protein-wise | single_label_classification
DeepSol_ESMFold | protein-wise | single_label_classification
DeepSoluE_ESMFold | protein-wise | single_label_classification
ProtSolM_ESMFold | protein-wise | single_label_classification
EC_ESMFold | protein-wise | multi_label_classification
EC_AlphaFold2 | protein-wise | multi_label_classification
GO_BP_ESMFold | protein-wise | multi_label_classification
GO_BP_AlphaFold2 | protein-wise | multi_label_classification
GO_CC_ESMFold | protein-wise | multi_label_classification
GO_CC_AlphaFold2 | protein-wise | multi_label_classification
GO_MF_ESMFold | protein-wise | multi_label_classification
GO_MF_AlphaFold2 | protein-wise | multi_label_classification
MetalIonBinding_ESMFold | protein-wise | single_label_classification
MetalIonBinding_AlphaFold2 | protein-wise | single_label_classification
Thermostability_ESMFold | protein-wise | regression

[!TIP] Only structural sequences are different for the same dataset, for example, DeepLocBinary_ESMFold and DeepLocBinary_AlphaFold2 share the same amino acid sequences, this means if you only want to use the aa_seqs, both are ok!

Supervised fine-tuning datasets (amino acid sequences)

FLIP_AAV | protein-site | regression
- FLIP_AAV_one-vs-rest, FLIP_AAV_two-vs-rest, FLIP_AAV_mut-des, FLIP_AAV_des-mut, FLIP_AAV_seven-vs-rest, FLIP_AAV_low-vs-high, FLIP_AAV_sampled
FLIP_GB1 | protein-site | regression
- FLIP_GB1_one-vs-rest, FLIP_GB1_two-vs-rest, FLIP_GB1_three-vs-rest, FLIP_GB1_low-vs-high, FLIP_GB1_sampled

📈 Supported Metrics

Metric Name	Full Name	Problem Type
accuracy	Accuracy	single_label_classification/ multi_label_classification
recall	Recall	single_label_classification/ multi_label_classification
precision	Precision	single_label_classification/ multi_label_classification
f1	F1Score	single_label_classification/ multi_label_classification
mcc	MatthewsCorrCoef	single_label_classification/ multi_label_classification
auc	AUROC	single_label_classification/ multi_label_classification
f1_max	F1ScoreMax	multi_label_classification
spearman_corr	SpearmanCorrCoef	regression

✈️ Reuirement

Conda Enviroment

Please make sure you have installed Anaconda3 or Miniconda3.

Hardware

We recommend a 24GB RTX 3090 or better, but it mainly depends on which PLM you choose.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
img		img
script		script
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✏️ Table of Contents

📑 Features

🤖 Supported Models

🔬 Supported Training Approaches

📚 Supported Datasets

📈 Supported Metrics

✈️ Reuirement

Conda Enviroment

Hardware

🧬 Get Started

Installation

Quick Start

🙌 Citation

🎊 Acknowledgement

About

Releases

Packages

Languages

License

code4luck/ProFactory_ft

Folders and files

Latest commit

History

Repository files navigation

✏️ Table of Contents

📑 Features

🤖 Supported Models

🔬 Supported Training Approaches

📚 Supported Datasets

📈 Supported Metrics

✈️ Reuirement

Conda Enviroment

Hardware

🧬 Get Started

Installation

Quick Start

🙌 Citation

🎊 Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages