Aidan Mannion commited on
Commit
d6a5e7a
·
1 Parent(s): 4f0ac9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -62,6 +62,7 @@ Experiments on general-domain data suggest that, given it's specialised training
62
  - linear learning rate schedule with 10,770 warmup steps
63
  - effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
64
  - MLM masking probability 0.15
 
65
  **Training regime:** The model was trained with fp16 non-mixed precision, using the AdamW optimizer with default parameters.
66
 
67
 
 
62
  - linear learning rate schedule with 10,770 warmup steps
63
  - effective batch size 1500 (15 sequences per batch x 100 gradient accumulation steps)
64
  - MLM masking probability 0.15
65
+
66
  **Training regime:** The model was trained with fp16 non-mixed precision, using the AdamW optimizer with default parameters.
67
 
68