Update README.md
Browse files
README.md
CHANGED
@@ -6,17 +6,17 @@ license: cc-by-nc-nd-4.0
|
|
6 |
|
7 |
A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks
|
8 |
|
9 |
-
[[Huggingface](https://huggingface.co/ChatterjeeLab/PTM-Mamba)] [[Github](https://github.com/programmablebio/ptm-mamba)]
|
10 |
|
|
|
11 |
|
12 |
-
|
13 |
-
Figure generated by DALL-E 3 with prompt "A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks".
|
14 |
|
15 |
## Install Enviroment
|
16 |
|
17 |
### Docker
|
18 |
|
19 |
-
Setting up env for mamba could be a pain, alternatively we suggest using docker containers.
|
20 |
|
21 |
#### Run container in interactive and detach mode, and mounte project dir to the container workspace.
|
22 |
|
@@ -43,11 +43,11 @@ pip install -e protein_lm/tokenizer/rust_trie
|
|
43 |
|
44 |
## Data
|
45 |
|
46 |
-
We collect protein sequences and their PTM annotations from Uniprot-Swissprot. The PTM annotations are represented as tokens and used to
|
47 |
|
48 |
## Configs
|
49 |
|
50 |
-
The training and testing configs are `protein_lm/configs`. We provide a basic training config at `protein_lm/configs/train/base.yaml`.
|
51 |
|
52 |
## Training
|
53 |
|
@@ -57,7 +57,7 @@ The training and testing configs are `protein_lm/configs`. We provide a basic tr
|
|
57 |
python ./protein_lm/modeling/scripts/train.py +train=base
|
58 |
```
|
59 |
|
60 |
-
The
|
61 |
|
62 |
##### Multi-GPU Training
|
63 |
|
@@ -109,3 +109,20 @@ This project is based on the following codebase. Please give them a star if you
|
|
109 |
|
110 |
- [OpenBioML/protein-lm-scaling (github.com)](https://github.com/OpenBioML/protein-lm-scaling)
|
111 |
- [state-spaces/mamba (github.com)](https://github.com/state-spaces/mamba)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
|
7 |
A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks
|
8 |
|
9 |
+
[[Huggingface](https://huggingface.co/ChatterjeeLab/PTM-Mamba)] [[Github](https://github.com/programmablebio/ptm-mamba)] [[Paper](https://www.biorxiv.org/content/10.1101/2024.02.28.581983v1)]
|
10 |
|
11 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/6430c79620265810703d3986/7QdA6MZ6OTmNHuwyDqFnN.png" width="300" height="300">
|
12 |
|
13 |
+
> Figure generated by Dalle-3 with prompt "A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks".
|
|
|
14 |
|
15 |
## Install Enviroment
|
16 |
|
17 |
### Docker
|
18 |
|
19 |
+
Setting up env for mamba could be a pain, alternatively, we suggest using docker containers.
|
20 |
|
21 |
#### Run container in interactive and detach mode, and mounte project dir to the container workspace.
|
22 |
|
|
|
43 |
|
44 |
## Data
|
45 |
|
46 |
+
We collect protein sequences and their PTM annotations from Uniprot-Swissprot. The PTM annotations are represented as tokens and used to replace the corresponding amino acids. The data can be downloaded from [here](https://drive.google.com/file/d/151KUp79tgBxphoIky1-ohyuvzIS1gtNS/view?usp=drive_link). Please place the data in `protein_lm/dataset/`.
|
47 |
|
48 |
## Configs
|
49 |
|
50 |
+
The training and testing configs are in `protein_lm/configs`. We provide a basic training config at `protein_lm/configs/train/base.yaml`.
|
51 |
|
52 |
## Training
|
53 |
|
|
|
57 |
python ./protein_lm/modeling/scripts/train.py +train=base
|
58 |
```
|
59 |
|
60 |
+
The command will use the configs in `protein_lm/configs/train/base.yaml`.
|
61 |
|
62 |
##### Multi-GPU Training
|
63 |
|
|
|
109 |
|
110 |
- [OpenBioML/protein-lm-scaling (github.com)](https://github.com/OpenBioML/protein-lm-scaling)
|
111 |
- [state-spaces/mamba (github.com)](https://github.com/state-spaces/mamba)
|
112 |
+
|
113 |
+
## Citation
|
114 |
+
Please cite our paper if you enjoy our code :)
|
115 |
+
```
|
116 |
+
@article {Peng2024.02.28.581983,
|
117 |
+
author = {Zhangzhi Peng and Benjamin Schussheim and Pranam Chatterjee},
|
118 |
+
title = {PTM-Mamba: A PTM-Aware Protein Language Model with Bidirectional Gated Mamba Blocks},
|
119 |
+
elocation-id = {2024.02.28.581983},
|
120 |
+
year = {2024},
|
121 |
+
doi = {10.1101/2024.02.28.581983},
|
122 |
+
publisher = {Cold Spring Harbor Laboratory},
|
123 |
+
URL = {https://www.biorxiv.org/content/early/2024/02/29/2024.02.28.581983},
|
124 |
+
eprint = {https://www.biorxiv.org/content/early/2024/02/29/2024.02.28.581983.full.pdf},
|
125 |
+
journal = {bioRxiv}
|
126 |
+
}
|
127 |
+
|
128 |
+
```
|