--- license: mit widget: - text: "MERVAVVGVPMDLGANRRGVDMGPSALRYARLLEQLEDLGYTVEDLGDVPVSLARASRRRGRGLAYLEEIRAAALVLKERLAALPEGVFPIVLGGDHSLSMGSVAGAARGRRVGVVWVDAHADFNTPETSPSGNVHGMPLAVLSGLGHPRLTEVFRAVDPKDVVLVGVRSLDPGEKRLLKEAGVRVY" --- ## Label Semantics: Label 0: Non-crystallizable (Negative) Label 1: Crystallizable (Positive) ## Dataset 1. [DeepCrystal Train](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Datasets/train.csv) 2. [DeepCrystal Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Datasets/test.csv) 3. [BCrystal Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/BCrystal_Balanced_Test_set) 4. [SP Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/SP_Final_set) 5. [TR Test](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/tree/main/Datasets/TR_Final_set) ## Model ### ESMCrystal_t6_8M_v1 ESMCrystal_t6_8M_v1 is a state-of-the-art protein crystallization prediction model finetuned on [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D), having 6 layers and 8M parameters with the size of [approx. 31.4MB](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/pytorch_model.bin) using transfer learning to predict whether an input protein sequence will crystallize or not. ## Accuracy : | Dataset | Accuracy | |------------------|--------------------| | DeepCrystal Test | 0.7913593256059009 | | BCrystal test | 0.7811975377728035 | | SP test | 0.6962025316455697 | | TR test | 0.8191699604743083 | ## Comparision Table: | | Count | Positives | Negatives | TP | FP | FN | TN | Precision | Recall | F1 | Accuracy | ROC | Mathew's Coefficient | PPV | NPV | |---------------|-------|-----------|-----------|-----|-----|----|-----|------------|------------|------------|------------|--------|----------------------|------------|------------| | | | | | | | | | | | | | | | | | | Test | 1898 | 898 | 1000 | 532 | 362 | 34 | 966 | 0.5950783 | 0.93992933 | 0.72876712 | 0.79091869 | 0.9467 | 0.611906376 | 0.5950783 | 0.966 | | | | | | | | | | | | | | | | | | | BCrystal Test | 1787 | 891 | 896 | 531 | 360 | 31 | 865 | 0.5959596 | 0.94483986 | 0.73090158 | 0.78119754 | 0.9396 | 0.604504011 | 0.5959596 | 0.96540179 | | | | | | | | | | | | | | | | | | | SP Test | 237 | 148 | 89 | 80 | 68 | 4 | 85 | 0.54054054 | 0.95238095 | 0.68965517 | 0.69620253 | 0.9328 | 0.501728679 | 0.54054054 | 0.95505618 | | | | | | | | | | | | | | | | | | | TR Test | 1012 | 374 | 638 | 207 | 167 | 16 | 622 | 0.55347594 | 0.92825112 | 0.69346734 | 0.81916996 | 0.9562 | 0.615341231 | 0.55347594 | 0.97492163 | | | | | | | | | | | | | | | | | | ## Graphs ### ROC-AUC Curve * DeepCrystal Test ![Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-test.png?raw=true) * BCrystal Test ![BCrystal Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-BCtest.png?raw=true) * SP Test ![SP Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-SPtest.png?raw=true) * TR Test ![TR Test ROC-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/ROC-final-TRtest.png?raw=true) ### PR-AUC Curve * DeepCrystal Test ![Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-test.png?raw=true) * BCrystal Test ![BCrystal Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-BCtest.png?raw=true) * SP Test ![SP Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-SPtest.png?raw=true) * TR Test ![TR Test PR-AUC Curve](https://huggingface.co/jaykmr/ESMCrystal_t6_8M_v1/blob/main/Graphs/PR-final-TRtest.png?raw=true) ## Final scores : * on DeepCrystal test: | | precision | recall | f1-score | support | |--------------------|-----------|--------|----------|---------| | non-crystallizable | 0.73 | 0.97 | 0.83 | 1000 | | crystallizable | 0.94 | 0.60 | 0.73 | 898 | | accuracy | | | 0.79 | 1898 | | macro avg | 0.83 | 0.78 | 0.78 | 1898 | | weighted avg | 0.83 | 0.79 | 0.78 | 1898 | * on BCrystal test: | | precision | recall | f1-score | support | |--------------------|-----------|--------|----------|---------| | non-crystallizable | 0.71 | 0.97 | 0.82 | 896 | | crystallizable | 0.94 | 0.60 | 0.73 | 891 | | accuracy | | | 0.78 | 1787 | | macro avg | 0.83 | 0.78 | 0.77 | 1787 | | weighted avg | 0.83 | 0.78 | 0.77 | 1787 | * on SP test: | | precision | recall | f1-score | support | |--------------------|-----------|--------|----------|---------| | non-crystallizable | 0.56 | 0.96 | 0.70 | 89 | | crystallizable | 0.95 | 0.54 | 0.69 | 148 | | accuracy | | | 0.70 | 237 | | macro avg | 0.75 | 0.75 | 0.70 | 237 | | weighted avg | 0.80 | 0.70 | 0.69 | 237 | * on TR test: | | precision | recall | f1-score | support | |--------------------|-----------|--------|----------|---------| | non-crystallizable | 0.79 | 0.97 | 0.87 | 638 | | crystallizable | 0.93 | 0.55 | 0.69 | 374 | | accuracy | | | 0.82 | 1012 | | macro avg | 0.86 | 0.76 | 0.78 | 1012 | | weighted avg | 0.84 | 0.82 | 0.81 | 1012 | ## Confusion matrix: * on DeepCrystal test: ``` | 532 | 362 | | 34 | 966 | ``` * on BCrystal test: ``` | 531 | 360 | | 31 | 865 | ``` * on SP test: ``` | 80 | 68 | | 4 | 85 | ``` * on TR test: ``` | 207 | 167 | | 16 | 622 | ``` ## Metrics roc score: * on DeepCrystal test: 0.9467594654788418 * on BCrystal test: 0.946546316337983 * on SP test: 0.9328120255086547 * on TR test: 0.9562804888270497 Mathews Coefficient: * on DeepCrystal test: 0.6130826598876417 * on BCrystal test: 0.6045040114572474 * on SP test: 0.5017286791304684 * on TR test: 0.6153412305503776 NPV: * on DeepCrystal test: 0.966 * on BCrystal test: 0.9654017857142857 * on SP test: 0.9550561797752809 * on TR test: 0.9749216300940439 PPV: * on DeepCrystal test: 0.5968819599109132 * on BCrystal test: 0.5959595959595959 * on SP test: 0.5405405405405406 * on TR test: 0.553475935828877 Researchers: * [Jayanth Kumar](https://jaykmr.com) * [Kavya Jaykumar](https://www.linkedin.com/in/kavya-jayakumar-6390271b5/) Credits: * [Meta ESMFold2](https://github.com/facebookresearch/esm) * [Huggingface](https://huggingface.co/jaykmr) * [Paperspace Compute Cloud](https://www.paperspace.com/)