AmelieSchreiber
commited on
Commit
·
cd1f4b4
1
Parent(s):
a29fcb1
Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ pipeline_tag: token-classification
|
|
22 |
|
23 |
# ESM-2 for Binding Site Prediction
|
24 |
|
25 |
-
This model is a finetuned version of the 35M parameter `esm2_t12_35M_UR50D` ([see here](https://huggingface.co/facebook/esm2_t12_35M_UR50D)
|
26 |
and [here](https://huggingface.co/docs/transformers/model_doc/esm) for more details). The model was finetuned with LoRA for
|
27 |
the binay token classification task of predicting binding sites (and active sites) of protein sequences based on sequence alone.
|
28 |
The model may be underfit and undertrained, however it still achieved better performance on the test set in terms of loss, accuracy,
|
@@ -38,15 +38,18 @@ This model was finetuned on ~549K protein sequences from the UniProt database. T
|
|
38 |
the following test metrics:
|
39 |
|
40 |
```
|
41 |
-
|
42 |
-
|
43 |
-
'
|
44 |
-
'
|
45 |
-
'
|
46 |
-
'
|
47 |
-
|
48 |
-
'
|
49 |
-
'
|
|
|
|
|
|
|
50 |
```
|
51 |
|
52 |
The dataset size increase from ~209K protein sequences to ~549K clearly improved performance in terms of test metric.
|
|
|
22 |
|
23 |
# ESM-2 for Binding Site Prediction
|
24 |
|
25 |
+
**This model is overfit (see below).** This model is a finetuned version of the 35M parameter `esm2_t12_35M_UR50D` ([see here](https://huggingface.co/facebook/esm2_t12_35M_UR50D)
|
26 |
and [here](https://huggingface.co/docs/transformers/model_doc/esm) for more details). The model was finetuned with LoRA for
|
27 |
the binay token classification task of predicting binding sites (and active sites) of protein sequences based on sequence alone.
|
28 |
The model may be underfit and undertrained, however it still achieved better performance on the test set in terms of loss, accuracy,
|
|
|
38 |
the following test metrics:
|
39 |
|
40 |
```
|
41 |
+
({'accuracy': 0.9905461579981686,
|
42 |
+
'precision': 0.7695765003685506,
|
43 |
+
'recall': 0.9841352974610041,
|
44 |
+
'f1': 0.8637307441810476,
|
45 |
+
'auc': 0.9874413786006525,
|
46 |
+
'mcc': 0.8658850560635515},
|
47 |
+
{'accuracy': 0.9394282959813123,
|
48 |
+
'precision': 0.3662722265170941,
|
49 |
+
'recall': 0.8330231316088238,
|
50 |
+
'f1': 0.5088208423175958,
|
51 |
+
'auc': 0.8883078682492643,
|
52 |
+
'mcc': 0.5283098562376193})
|
53 |
```
|
54 |
|
55 |
The dataset size increase from ~209K protein sequences to ~549K clearly improved performance in terms of test metric.
|