Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # Model Card for FAPM (Functional Annotation of Proteins using Multi-Modal Models)
6
+
7
+ <!-- Provide a quick summary of what the model is/does. -->
8
+
9
+ Adapted from BLIP2, a Q-Former was introduced between the protein sequence modality and the natural language modality for protein captioning. The protein sequence is encoded by pretrained ESM2, and Mistral-7B-v0.2 is used for decoding the natural language protein descriptions.
10
+
11
+ ## Model Details
12
+
13
+ ### Model Description
14
+
15
+ <!-- Provide a longer summary of what this model is. -->
16
+ Assigning accurate property labels to proteins, like functional terms and catalytic activity, is challenging, especially for proteins without homologs and tail labels with few known examples. Unlike previous methods that mainly focused on protein sequence features, we use a pretrained large natural language model to understand the semantic meaning of protein labels. Specifically, we introduce FAPM, a contrastive multi-modal model that links natural language with protein sequence language. This model combines a pretrained protein sequence model with a pretrained large language model to generate labels, such as Gene Ontology (GO) functional terms and catalytic activity predictions, in natural language. Our results show that FAPM excels in understanding protein properties, outperforming models based solely on protein sequences or structures. It achieves state-of-the-art performance on public benchmarks and in-house experimentally annotated phage proteins, which often have few known homologs. Additionally, FAPM's flexibility allows it to incorporate extra text prompts, like taxonomy information, enhancing both its predictive performance and explainability. This novel approach offers a promising alternative to current methods that rely on multiple sequence alignment for protein annotation. The online demo is at: https://huggingface.co/spaces/wenkai/FAPM_demo.
17
+
18
+ ### Model Sources
19
+
20
+ <!-- Provide the basic links for the model. -->
21
+
22
+ - **Repository:** [Github](https://github.com/xiangwenkai/FAPM)
23
+ - **Paper:** [BioRxiv](https://www.biorxiv.org/content/10.1101/2024.05.07.593067v2)
24
+ - **Demo:** [Space](https://huggingface.co/spaces/wenkai/FAPM_demo)
25
+
26
+ ## Citation
27
+
28
+ **BibTeX:**
29
+
30
+ @article {Xiang2024.05.07.593067,
31
+ author = {Xiang, Wenkai and Xiong, Zhaoping and Huan, Chen and Xiong, Jiacheng and Zhang, Wei and Fu, Zunyun and Zheng, Mingyue and Liu, Bing and Shi, Qian},
32
+ title = {FAPM: Functional Annotation of Proteins using Multi-Modal Models Beyond Structural Modeling},
33
+ elocation-id = {2024.05.07.593067},
34
+ year = {2024},
35
+ doi = {10.1101/2024.05.07.593067},
36
+ publisher = {Cold Spring Harbor Laboratory},
37
+ URL = {https://www.biorxiv.org/content/early/2024/07/03/2024.05.07.593067},
38
+ eprint = {https://www.biorxiv.org/content/early/2024/07/03/2024.05.07.593067.full.pdf},
39
+ journal = {bioRxiv}
40
+ }
41
+
42
+ ## Model Card Authors
43
+
44
+ Wenkai Xiang (xiangwenkai@lglab.ac.cn)
45
+ Zhaoping Xiong (xiongzhaoping@protonunfold.com)
46
+
47
+ ## Acknowledgement
48
+
49
+ [ProtonUnfold Inc.](https://protonunfold.com)
50
+ [Lingang Lab](https://www.lglab.ac.cn/)