AmelieSchreiber
commited on
Commit
•
b73ac0e
1
Parent(s):
075cf92
Update README.md
Browse files
README.md
CHANGED
@@ -41,11 +41,21 @@ has learned to predict binding sites well (and that EvoProtGrad works as intende
|
|
41 |
|
42 |
## Training
|
43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
```
|
45 |
epoch 3:
|
46 |
Training Loss Validation Loss Precision Recall F1 Auc
|
47 |
0.031100 0.074720 0.684798 0.966856 0.801743 0.980853
|
48 |
```
|
|
|
49 |
|
50 |
```
|
51 |
wandb: lr: 0.0004977045729600779
|
|
|
41 |
|
42 |
## Training
|
43 |
|
44 |
+
This model was trained on approximately 70,000 proteins with binding site and active site annotations in UniProt.
|
45 |
+
The training split was a random 85/15 split for this version, and does not consider anything in the way of family or sequence
|
46 |
+
similarity. New iterations of the model have been trained on larger datasets (over 200,000 proteins), with the split such that
|
47 |
+
there are no overlapping families, however they seem to overfit much earlier and have significantly worse performance in terms
|
48 |
+
of the training metrics (precision, recall, and F1).
|
49 |
+
|
50 |
+
Training Metrics for the Model in the form of the `trainer_state.json` can be
|
51 |
+
[found here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_general_binding_sites_v2/blob/main/trainer_state.json).
|
52 |
+
|
53 |
```
|
54 |
epoch 3:
|
55 |
Training Loss Validation Loss Precision Recall F1 Auc
|
56 |
0.031100 0.074720 0.684798 0.966856 0.801743 0.980853
|
57 |
```
|
58 |
+
The hyperparameters are:
|
59 |
|
60 |
```
|
61 |
wandb: lr: 0.0004977045729600779
|