adarshxs commited on
Commit
bce5b0e
1 Parent(s): 465e398

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -96,6 +96,12 @@ special_tokens:
96
  The model achieves the following loss:
97
  - Loss: 1.3647
98
 
 
 
 
 
 
 
99
  ### Training hyperparameters
100
 
101
  The following hyperparameters were used during training:
 
96
  The model achieves the following loss:
97
  - Loss: 1.3647
98
 
99
+ The loss exploded after a couple hundred steps. As suggested by [winglian](https://x.com/winglian/status/1740776666744700941?s=20), we set the following values in the config file:
100
+ ```
101
+ adam_epsilon: 0.00001
102
+ max_grad_norm: 1.0
103
+ ```
104
+
105
  ### Training hyperparameters
106
 
107
  The following hyperparameters were used during training: