ESMCrystal_t6_8M_v1 / README.md
jaykmr's picture
Update README.md
1879f56
|
raw
history blame
No virus
7.93 kB
---
license: mit
widget:
- text: "MERVAVVGVPMDLGANRRGVDMGPSALRYARLLEQLEDLGYTVEDLGDVPVSLARASRRRGRGLAYLEEIRAAALVLKERLAALPEGVFPIVLGGDHSLSMGSVAGAARGRRVGVVWVDAHADFNTPETSPSGNVHGMPLAVLSGLGHPRLTEVFRAVDPKDVVLVGVRSLDPGEKRLLKEAGVRVY"
---
## Label Semantics:
Label 0: Non-crystallizable (Negative)
Label 1: Crystallizable (Positive)
## Dataset
1. [DeepCrystal Train](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Datasets/train.csv)
2. [DeepCrystal Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Datasets/test.csv)
3. [BCrystal Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/BCrystal_Balanced_Test_set)
4. [SP Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/SP_Final_set)
5. [TR Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/TR_Final_set)
## Model
### ESMCrystal_t6_8M_v1
ESMCrystal_t6_8M_v1 is a state-of-the-art protein crystallization prediction model finetuned on [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D),
having 6 layers and 8M parameters with the size of [approx. 31.4MB](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/pytorch_model.bin)
using transfer learning to predict whether an input protein sequence will crystallize or not.
## Accuracy :
| Dataset | Accuracy |
|------------------|--------------------|
| DeepCrystal Test | 0.7913593256059009 |
| BCrystal test | 0.7811975377728035 |
| SP test | 0.6962025316455697 |
| TR test | 0.8191699604743083 |
## Comparision Table:
| | Count | Positives | Negatives | TP | FP | FN | TN | Precision | Recall | F1 | Accuracy | ROC | Mathew's Coefficient | PPV | NPV |
|---------------|-------|-----------|-----------|-----|-----|----|-----|------------|------------|------------|------------|--------|----------------------|------------|------------|
| | | | | | | | | | | | | | | | |
| Test | 1898 | 898 | 1000 | 532 | 362 | 34 | 966 | 0.5950783 | 0.93992933 | 0.72876712 | 0.79091869 | 0.9467 | 0.611906376 | 0.5950783 | 0.966 |
| | | | | | | | | | | | | | | | |
| BCrystal Test | 1787 | 891 | 896 | 531 | 360 | 31 | 865 | 0.5959596 | 0.94483986 | 0.73090158 | 0.78119754 | 0.9396 | 0.604504011 | 0.5959596 | 0.96540179 |
| | | | | | | | | | | | | | | | |
| SP Test | 237 | 148 | 89 | 80 | 68 | 4 | 85 | 0.54054054 | 0.95238095 | 0.68965517 | 0.69620253 | 0.9328 | 0.501728679 | 0.54054054 | 0.95505618 |
| | | | | | | | | | | | | | | | |
| TR Test | 1012 | 374 | 638 | 207 | 167 | 16 | 622 | 0.55347594 | 0.92825112 | 0.69346734 | 0.81916996 | 0.9562 | 0.615341231 | 0.55347594 | 0.97492163 |
| | | | | | | | | | | | | | | | |
## Graphs
### ROC-AUC Curve
* DeepCrystal Test
![Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-test.png?raw=true)
* BCrystal Test
![BCrystal Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-BCtest.png?raw=true)
* SP Test
![SP Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-SPtest.png?raw=true)
* TR Test
![TR Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-TRtest.png?raw=true)
### PR-AUC Curve
* DeepCrystal Test
![Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-test.png?raw=true)
* BCrystal Test
![BCrystal Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-BCtest.png?raw=true)
* SP Test
![SP Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-SPtest.png?raw=true)
* TR Test
![TR Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-TRtest.png?raw=true)
## Final scores :
* on DeepCrystal test:
| | precision | recall | f1-score | support |
|--------------------|-----------|--------|----------|---------|
| non-crystallizable | 0.73 | 0.97 | 0.83 | 1000 |
| crystallizable | 0.94 | 0.60 | 0.73 | 898 |
| accuracy | | | 0.79 | 1898 |
| macro avg | 0.83 | 0.78 | 0.78 | 1898 |
| weighted avg | 0.83 | 0.79 | 0.78 | 1898 |
* on BCrystal test:
| | precision | recall | f1-score | support |
|--------------------|-----------|--------|----------|---------|
| non-crystallizable | 0.71 | 0.97 | 0.82 | 896 |
| crystallizable | 0.94 | 0.60 | 0.73 | 891 |
| accuracy | | | 0.78 | 1787 |
| macro avg | 0.83 | 0.78 | 0.77 | 1787 |
| weighted avg | 0.83 | 0.78 | 0.77 | 1787 |
* on SP test:
| | precision | recall | f1-score | support |
|--------------------|-----------|--------|----------|---------|
| non-crystallizable | 0.56 | 0.96 | 0.70 | 89 |
| crystallizable | 0.95 | 0.54 | 0.69 | 148 |
| accuracy | | | 0.70 | 237 |
| macro avg | 0.75 | 0.75 | 0.70 | 237 |
| weighted avg | 0.80 | 0.70 | 0.69 | 237 |
* on TR test:
| | precision | recall | f1-score | support |
|--------------------|-----------|--------|----------|---------|
| non-crystallizable | 0.79 | 0.97 | 0.87 | 638 |
| crystallizable | 0.93 | 0.55 | 0.69 | 374 |
| accuracy | | | 0.82 | 1012 |
| macro avg | 0.86 | 0.76 | 0.78 | 1012 |
| weighted avg | 0.84 | 0.82 | 0.81 | 1012 |
## Confusion matrix:
* on DeepCrystal test:
```
| 532 | 362 |
| 34 | 966 |
```
* on BCrystal test:
```
| 531 | 360 |
| 31 | 865 |
```
* on SP test:
```
| 80 | 68 |
| 4 | 85 |
```
* on TR test:
```
| 207 | 167 |
| 16 | 622 |
```
## Metrics
roc score:
* on DeepCrystal test: 0.9467594654788418
* on BCrystal test: 0.946546316337983
* on SP test: 0.9328120255086547
* on TR test: 0.9562804888270497
Mathews Coefficient:
* on DeepCrystal test: 0.6130826598876417
* on BCrystal test: 0.6045040114572474
* on SP test: 0.5017286791304684
* on TR test: 0.6153412305503776
NPV:
* on DeepCrystal test: 0.966
* on BCrystal test: 0.9654017857142857
* on SP test: 0.9550561797752809
* on TR test: 0.9749216300940439
PPV:
* on DeepCrystal test: 0.5968819599109132
* on BCrystal test: 0.5959595959595959
* on SP test: 0.5405405405405406
* on TR test: 0.553475935828877
Researchers:
* [Jayanth Kumar](https://jaykmr.com)
* [Kavya Jaykumar](https://www.linkedin.com/in/kavya-jayakumar-6390271b5/)
Credits:
* [Meta ESMFold2](https://github.com/facebookresearch/esm)
* [Huggingface](https://huggingface.co/jaykmr)
* [Paperspace Compute Cloud](https://www.paperspace.com/)