ProSST

Code for ProSST: A Pre-trained Protein Sequence and Structure Transformer with Disentangled Attention.

News

Our MSA-Enhanced model ProtREM has achieved 0.518 Spearman's rho in the ProteinGym benchmark.

1 Install

git clone https://github.com/ginnm/ProSST.git
cd ProSST
pip install -r requirements.txt
export PYTHONPATH=$PYTHONPATH:$(pwd)

2 Structure quantizer

from prosst.structure.quantizer import PdbQuantizer
processor = PdbQuantizer(structure_vocab_size=2048) # can be 20, 128, 512, 1024, 2048, 4096
result = processor("example_data/p1.pdb", return_residue_seq=False)

Output:

[407, 998, 1841, 1421, 653, 450, 117, 822, ...]

3 ProSST models have been uploaded to huggingface 🤗 Transformers

from transformers import AutoModelForMaskedLM, AutoTokenizer
model = AutoModelForMaskedLM.from_pretrained("AI4Protein/ProSST-2048", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("AI4Protein/ProSST-2048", trust_remote_code=True)

See AI4Protein/ProSST-* for more models.

4 Zero-shot mutant effect prediction

4.1 Example notebook

Zero-shot mutant effect prediction

4.2 Run ProteinGYM Benchmark

Download dataset from Google Driver. (This file contains quantized structures within ProteinGYM).

cd example_data
unzip proteingym_benchmark.zip

python zero_shot/proteingym_benchmark.py --model_path AI4Protein/ProSST-2048 \
--structure_dir example_data/structure_sequence/2048

Citation

If you use ProSST in your research, please cite the following paper:

@article {Li2024.04.15.589672,
	author = {Li, Mingchen and Tan, Yang and Ma, Xinzhu and Zhong, Bozitao and Zhou, Ziyi and Yu, Huiqun and Ouyang, Wanli and Hong, Liang and Zhou, Bingxin and Tan, Pan},
	title = {ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention},
	elocation-id = {2024.04.15.589672},
	year = {2024},
	doi = {10.1101/2024.04.15.589672},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/05/17/2024.04.15.589672.1},
	eprint = {https://www.biorxiv.org/content/early/2024/05/17/2024.04.15.589672.1.full.pdf},
	journal = {bioRxiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
example_data		example_data
images		images
prosst/structure		prosst/structure
test		test
zero_shot		zero_shot
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProSST

News

1 Install

2 Structure quantizer

3 ProSST models have been uploaded to huggingface 🤗 Transformers

4 Zero-shot mutant effect prediction

4.1 Example notebook

4.2 Run ProteinGYM Benchmark

Citation

About

Releases

Packages

Languages

License

ai4protein/ProSST

Folders and files

Latest commit

History

Repository files navigation

ProSST

News

1 Install

2 Structure quantizer

3 ProSST models have been uploaded to huggingface 🤗 Transformers

4 Zero-shot mutant effect prediction

4.1 Example notebook

4.2 Run ProteinGYM Benchmark

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages