AmelieSchreiber commited on
Commit
101ed72
1 Parent(s): d9babe5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -21,6 +21,7 @@ tags:
21
  ---
22
  # ESM-2 for Binding Site Prediction
23
 
 
24
  This model *may be* close to SOTA compared to [these SOTA structural models](https://www.biorxiv.org/content/10.1101/2023.08.11.553028v1).
25
  One of the primary goals in training this model is to prove the viability of using simple, single sequence only protein language models
26
  for binary token classification tasks like predicting binding and active sites of protein sequences based on sequence alone. This project
@@ -43,6 +44,23 @@ dataset [found here](https://huggingface.co/datasets/AmelieSchreiber/binding_sit
43
  this model has a high recall, meaning it is likely to detect binding sites, but it has a precision score that is somewhat lower than the SOTA
44
  structural models mentioned above, meaning the model may return some false positives as well.
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ## Running Inference
48
 
 
21
  ---
22
  # ESM-2 for Binding Site Prediction
23
 
24
+ **This model is overfit (see below)**
25
  This model *may be* close to SOTA compared to [these SOTA structural models](https://www.biorxiv.org/content/10.1101/2023.08.11.553028v1).
26
  One of the primary goals in training this model is to prove the viability of using simple, single sequence only protein language models
27
  for binary token classification tasks like predicting binding and active sites of protein sequences based on sequence alone. This project
 
44
  this model has a high recall, meaning it is likely to detect binding sites, but it has a precision score that is somewhat lower than the SOTA
45
  structural models mentioned above, meaning the model may return some false positives as well.
46
 
47
+ ## Overfitting Issues
48
+
49
+ ```python
50
+ ({'accuracy': 0.9908574638195745,
51
+ 'precision': 0.7748830511095647,
52
+ 'recall': 0.9862043939282111,
53
+ 'f1': 0.8678649909611492,
54
+ 'auc': 0.9886039823329382,
55
+ 'mcc': 0.8699396085712834},
56
+ {'accuracy': 0.9486280975482552,
57
+ 'precision': 0.40980984516603186,
58
+ 'recall': 0.827004864790918,
59
+ 'f1': 0.5480444772577421,
60
+ 'auc': 0.890196425388581,
61
+ 'mcc': 0.560633448203768})
62
+ ```
63
+
64
 
65
  ## Running Inference
66