AmelieSchreiber commited on
Commit
cd1f4b4
·
1 Parent(s): a29fcb1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -10
README.md CHANGED
@@ -22,7 +22,7 @@ pipeline_tag: token-classification
22
 
23
  # ESM-2 for Binding Site Prediction
24
 
25
- This model is a finetuned version of the 35M parameter `esm2_t12_35M_UR50D` ([see here](https://huggingface.co/facebook/esm2_t12_35M_UR50D)
26
  and [here](https://huggingface.co/docs/transformers/model_doc/esm) for more details). The model was finetuned with LoRA for
27
  the binay token classification task of predicting binding sites (and active sites) of protein sequences based on sequence alone.
28
  The model may be underfit and undertrained, however it still achieved better performance on the test set in terms of loss, accuracy,
@@ -38,15 +38,18 @@ This model was finetuned on ~549K protein sequences from the UniProt database. T
38
  the following test metrics:
39
 
40
  ```
41
- Test: (Epoch 1)
42
- {'Training Loss': 0.037400,
43
- 'Validation Loss': 0.301413,
44
- 'accuracy': 0.939431,
45
- 'precision': 0.366282,
46
- 'recall': 0.833003,
47
- 'f1': 0.508826,
48
- 'auc': 0.888300,
49
- 'mcc': 0.528311})
 
 
 
50
  ```
51
 
52
  The dataset size increase from ~209K protein sequences to ~549K clearly improved performance in terms of test metric.
 
22
 
23
  # ESM-2 for Binding Site Prediction
24
 
25
+ **This model is overfit (see below).** This model is a finetuned version of the 35M parameter `esm2_t12_35M_UR50D` ([see here](https://huggingface.co/facebook/esm2_t12_35M_UR50D)
26
  and [here](https://huggingface.co/docs/transformers/model_doc/esm) for more details). The model was finetuned with LoRA for
27
  the binay token classification task of predicting binding sites (and active sites) of protein sequences based on sequence alone.
28
  The model may be underfit and undertrained, however it still achieved better performance on the test set in terms of loss, accuracy,
 
38
  the following test metrics:
39
 
40
  ```
41
+ ({'accuracy': 0.9905461579981686,
42
+ 'precision': 0.7695765003685506,
43
+ 'recall': 0.9841352974610041,
44
+ 'f1': 0.8637307441810476,
45
+ 'auc': 0.9874413786006525,
46
+ 'mcc': 0.8658850560635515},
47
+ {'accuracy': 0.9394282959813123,
48
+ 'precision': 0.3662722265170941,
49
+ 'recall': 0.8330231316088238,
50
+ 'f1': 0.5088208423175958,
51
+ 'auc': 0.8883078682492643,
52
+ 'mcc': 0.5283098562376193})
53
  ```
54
 
55
  The dataset size increase from ~209K protein sequences to ~549K clearly improved performance in terms of test metric.