|
--- |
|
license: mit |
|
widget: |
|
- text: "MERVAVVGVPMDLGANRRGVDMGPSALRYARLLEQLEDLGYTVEDLGDVPVSLARASRRRGRGLAYLEEIRAAALVLKERLAALPEGVFPIVLGGDHSLSMGSVAGAARGRRVGVVWVDAHADFNTPETSPSGNVHGMPLAVLSGLGHPRLTEVFRAVDPKDVVLVGVRSLDPGEKRLLKEAGVRVY" |
|
--- |
|
|
|
## Label Semantics: |
|
|
|
Label 0: Non-crystallizable (Negative) |
|
|
|
Label 1: Crystallizable (Positive) |
|
|
|
## Dataset |
|
|
|
1. [DeepCrystal Train](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Datasets/train.csv) |
|
2. [DeepCrystal Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Datasets/test.csv) |
|
3. [BCrystal Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/BCrystal_Balanced_Test_set) |
|
4. [SP Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/SP_Final_set) |
|
5. [TR Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/TR_Final_set) |
|
|
|
|
|
## Model |
|
|
|
### ESMCrystal_t6_8M_v1 |
|
|
|
ESMCrystal_t6_8M_v1 is a state-of-the-art protein crystallization prediction model finetuned on [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D), |
|
having 6 layers and 8M parameters with the size of [approx. 31.4MB](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/pytorch_model.bin) |
|
using transfer learning to predict whether an input protein sequence will crystallize or not. |
|
|
|
## Accuracy : |
|
|
|
|
|
| Dataset | Accuracy | |
|
|------------------|--------------------| |
|
| DeepCrystal Test | 0.7913593256059009 | |
|
| BCrystal test | 0.7811975377728035 | |
|
| SP test | 0.6962025316455697 | |
|
| TR test | 0.8191699604743083 | |
|
|
|
## Comparision Table: |
|
|
|
| | Count | Positives | Negatives | TP | FP | FN | TN | Precision | Recall | F1 | Accuracy | ROC | Mathew's Coefficient | PPV | NPV | |
|
|---------------|-------|-----------|-----------|-----|-----|----|-----|------------|------------|------------|------------|--------|----------------------|------------|------------| |
|
| | | | | | | | | | | | | | | | | |
|
| Test | 1898 | 898 | 1000 | 532 | 362 | 34 | 966 | 0.5950783 | 0.93992933 | 0.72876712 | 0.79091869 | 0.9467 | 0.611906376 | 0.5950783 | 0.966 | |
|
| | | | | | | | | | | | | | | | | |
|
| BCrystal Test | 1787 | 891 | 896 | 531 | 360 | 31 | 865 | 0.5959596 | 0.94483986 | 0.73090158 | 0.78119754 | 0.9396 | 0.604504011 | 0.5959596 | 0.96540179 | |
|
| | | | | | | | | | | | | | | | | |
|
| SP Test | 237 | 148 | 89 | 80 | 68 | 4 | 85 | 0.54054054 | 0.95238095 | 0.68965517 | 0.69620253 | 0.9328 | 0.501728679 | 0.54054054 | 0.95505618 | |
|
| | | | | | | | | | | | | | | | | |
|
| TR Test | 1012 | 374 | 638 | 207 | 167 | 16 | 622 | 0.55347594 | 0.92825112 | 0.69346734 | 0.81916996 | 0.9562 | 0.615341231 | 0.55347594 | 0.97492163 | |
|
| | | | | | | | | | | | | | | | | |
|
|
|
## Graphs |
|
|
|
### ROC-AUC Curve |
|
|
|
* DeepCrystal Test |
|
![Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-test.png?raw=true) |
|
|
|
* BCrystal Test |
|
![BCrystal Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-BCtest.png?raw=true) |
|
|
|
* SP Test |
|
![SP Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-SPtest.png?raw=true) |
|
|
|
* TR Test |
|
![TR Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-TRtest.png?raw=true) |
|
|
|
### PR-AUC Curve |
|
|
|
* DeepCrystal Test |
|
![Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-test.png?raw=true) |
|
|
|
* BCrystal Test |
|
![BCrystal Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-BCtest.png?raw=true) |
|
|
|
* SP Test |
|
![SP Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-SPtest.png?raw=true) |
|
|
|
* TR Test |
|
![TR Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-TRtest.png?raw=true) |
|
|
|
|
|
|
|
## Final scores : |
|
|
|
|
|
* on DeepCrystal test: |
|
|
|
| | precision | recall | f1-score | support | |
|
|--------------------|-----------|--------|----------|---------| |
|
| non-crystallizable | 0.73 | 0.97 | 0.83 | 1000 | |
|
| crystallizable | 0.94 | 0.60 | 0.73 | 898 | |
|
| accuracy | | | 0.79 | 1898 | |
|
| macro avg | 0.83 | 0.78 | 0.78 | 1898 | |
|
| weighted avg | 0.83 | 0.79 | 0.78 | 1898 | |
|
|
|
* on BCrystal test: |
|
|
|
| | precision | recall | f1-score | support | |
|
|--------------------|-----------|--------|----------|---------| |
|
| non-crystallizable | 0.71 | 0.97 | 0.82 | 896 | |
|
| crystallizable | 0.94 | 0.60 | 0.73 | 891 | |
|
| accuracy | | | 0.78 | 1787 | |
|
| macro avg | 0.83 | 0.78 | 0.77 | 1787 | |
|
| weighted avg | 0.83 | 0.78 | 0.77 | 1787 | |
|
|
|
* on SP test: |
|
|
|
| | precision | recall | f1-score | support | |
|
|--------------------|-----------|--------|----------|---------| |
|
| non-crystallizable | 0.56 | 0.96 | 0.70 | 89 | |
|
| crystallizable | 0.95 | 0.54 | 0.69 | 148 | |
|
| accuracy | | | 0.70 | 237 | |
|
| macro avg | 0.75 | 0.75 | 0.70 | 237 | |
|
| weighted avg | 0.80 | 0.70 | 0.69 | 237 | |
|
|
|
* on TR test: |
|
|
|
| | precision | recall | f1-score | support | |
|
|--------------------|-----------|--------|----------|---------| |
|
| non-crystallizable | 0.79 | 0.97 | 0.87 | 638 | |
|
| crystallizable | 0.93 | 0.55 | 0.69 | 374 | |
|
| accuracy | | | 0.82 | 1012 | |
|
| macro avg | 0.86 | 0.76 | 0.78 | 1012 | |
|
| weighted avg | 0.84 | 0.82 | 0.81 | 1012 | |
|
|
|
|
|
## Confusion matrix: |
|
|
|
* on DeepCrystal test: |
|
|
|
``` |
|
| 532 | 362 | |
|
| 34 | 966 | |
|
``` |
|
|
|
* on BCrystal test: |
|
|
|
``` |
|
| 531 | 360 | |
|
| 31 | 865 | |
|
``` |
|
|
|
* on SP test: |
|
|
|
``` |
|
| 80 | 68 | |
|
| 4 | 85 | |
|
``` |
|
|
|
* on TR test: |
|
|
|
``` |
|
| 207 | 167 | |
|
| 16 | 622 | |
|
``` |
|
|
|
## Metrics |
|
|
|
|
|
roc score: |
|
|
|
* on DeepCrystal test: 0.9467594654788418 |
|
|
|
* on BCrystal test: 0.946546316337983 |
|
|
|
* on SP test: 0.9328120255086547 |
|
|
|
* on TR test: 0.9562804888270497 |
|
|
|
Mathews Coefficient: |
|
|
|
* on DeepCrystal test: 0.6130826598876417 |
|
|
|
* on BCrystal test: 0.6045040114572474 |
|
|
|
* on SP test: 0.5017286791304684 |
|
|
|
* on TR test: 0.6153412305503776 |
|
|
|
NPV: |
|
|
|
* on DeepCrystal test: 0.966 |
|
|
|
* on BCrystal test: 0.9654017857142857 |
|
|
|
* on SP test: 0.9550561797752809 |
|
|
|
* on TR test: 0.9749216300940439 |
|
|
|
PPV: |
|
|
|
* on DeepCrystal test: 0.5968819599109132 |
|
|
|
* on BCrystal test: 0.5959595959595959 |
|
|
|
* on SP test: 0.5405405405405406 |
|
|
|
* on TR test: 0.553475935828877 |
|
|
|
|
|
Researchers: |
|
|
|
* [Jayanth Kumar](https://jaykmr.com) |
|
* [Kavya Jaykumar](https://www.linkedin.com/in/kavya-jayakumar-6390271b5/) |
|
|
|
Credits: |
|
|
|
* [Meta ESMFold2](https://github.com/facebookresearch/esm) |
|
* [Huggingface](https://huggingface.co/jaykmr) |
|
* [Paperspace Compute Cloud](https://www.paperspace.com/) |