Fill-Mask
Transformers
PyTorch
esm
Inference Endpoints
pranamanam commited on
Commit
906448b
1 Parent(s): 90c7dd4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -3,7 +3,10 @@ license: cc-by-nc-nd-4.0
3
  ---
4
  **FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking**
5
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/eR38p4VJhWJhwsqjZZdYp.png)
6
- Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, drive and sustain various cancers, particularly those impacting children. Unfortunately, due to their intrinsically disordered nature, large size, and lack of well-defined, druggable pockets, they have been historically challenging to target therapeutically: neither small molecule-based methods nor structure-based approaches for binder design are strong options for this class of molecules. Recently, protein language models (pLMs) have demonstrated success at representing protein sequences with information-rich embeddings, enabling downstream design applications from sequence alone. However, no current pLM has been trained on fusion oncoprotein sequences and thus may not produce optimal representations for these proteins. In this work, we introduce FusOn-pLM, a novel pLM that fine-tunes state-of-the-art ESM-2 embeddings on fusion oncoprotein sequences via masked language modeling (MLM). We specifically introduce a novel MLM strategy, employing a binding-site probability predictor to focus masking on key amino acid residues, thereby generating more optimal fusion oncoprotein-aware embeddings. Our model improves performance on both fusion oncoprotein-specific benchmarks and disorder prediction tasks in comparison to baseline ESM-2 representations, as well as manually-constructed biophysical embeddings, motivating downstream usage of FusOn-pLM embeddings for therapeutic design tasks targeting these fusions.
 
 
 
7
 
8
  ```
9
  # Load model directly
 
3
  ---
4
  **FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking**
5
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/eR38p4VJhWJhwsqjZZdYp.png)
6
+ In this work, we introduce FusOn-pLM, a novel pLM that fine-tunes state-of-the-art ESM-2 embeddings on fusion oncoprotein sequences, those that drive a large portion of pediatric cancers but are heavily disordered and undruggable, via masked language modeling (MLM). We specifically introduce a novel MLM strategy, employing a binding-site probability predictor to focus masking on key amino acid residues, thereby generating more optimal fusion oncoprotein-aware embeddings. Our model improves performance on both fusion oncoprotein-specific benchmarks and disorder prediction tasks in comparison to baseline ESM-2 representations, as well as manually-constructed biophysical embeddings, motivating downstream usage of FusOn-pLM embeddings for therapeutic design tasks targeting these fusions.
7
+
8
+
9
+ # How to Use FusOn-pLM
10
 
11
  ```
12
  # Load model directly