AmelieSchreiber commited on
Commit
b73ac0e
1 Parent(s): 075cf92

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -41,11 +41,21 @@ has learned to predict binding sites well (and that EvoProtGrad works as intende
41
 
42
  ## Training
43
 
 
 
 
 
 
 
 
 
 
44
  ```
45
  epoch 3:
46
  Training Loss Validation Loss Precision Recall F1 Auc
47
  0.031100 0.074720 0.684798 0.966856 0.801743 0.980853
48
  ```
 
49
 
50
  ```
51
  wandb: lr: 0.0004977045729600779
 
41
 
42
  ## Training
43
 
44
+ This model was trained on approximately 70,000 proteins with binding site and active site annotations in UniProt.
45
+ The training split was a random 85/15 split for this version, and does not consider anything in the way of family or sequence
46
+ similarity. New iterations of the model have been trained on larger datasets (over 200,000 proteins), with the split such that
47
+ there are no overlapping families, however they seem to overfit much earlier and have significantly worse performance in terms
48
+ of the training metrics (precision, recall, and F1).
49
+
50
+ Training Metrics for the Model in the form of the `trainer_state.json` can be
51
+ [found here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_general_binding_sites_v2/blob/main/trainer_state.json).
52
+
53
  ```
54
  epoch 3:
55
  Training Loss Validation Loss Precision Recall F1 Auc
56
  0.031100 0.074720 0.684798 0.966856 0.801743 0.980853
57
  ```
58
+ The hyperparameters are:
59
 
60
  ```
61
  wandb: lr: 0.0004977045729600779