Tijmen2 commited on
Commit
991a2f2
1 Parent(s): ad6c9ce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -2
README.md CHANGED
@@ -57,10 +57,25 @@ _reliability_. While many of its answers are factually accurate, some are not. T
57
 
58
  ### Training hyperparameters
59
 
60
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
  The following hyperparameters were used during QA tuning:
63
  - learning_rate: 2e-06
 
64
  - train_batch_size: 4
65
  - eval_batch_size: 4
66
  - seed: 702
@@ -71,4 +86,5 @@ The following hyperparameters were used during QA tuning:
71
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
72
  - lr_scheduler_type: linear
73
  - lr_scheduler_warmup_steps: 100
74
- - num_epochs: 2.0
 
 
57
 
58
  ### Training hyperparameters
59
 
60
+ The following hyperparameters were used during continued pretraining:
61
+ - learning_rate: 1e-05
62
+ - max_grad_norm: 3.0
63
+ - train_batch_size: 4
64
+ - eval_batch_size: 4
65
+ - seed: 701
66
+ - distributed_type: multi-GPU
67
+ - num_devices: 4
68
+ - total_train_batch_size: 16
69
+ - total_eval_batch_size: 16
70
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
71
+ - lr_scheduler_type: cosine
72
+ - lr_scheduler_warmup_steps: 100
73
+ - num_epochs: 3.0
74
+ - weight_decay: 1e-04
75
 
76
  The following hyperparameters were used during QA tuning:
77
  - learning_rate: 2e-06
78
+ - max_grad_norm: 3.0
79
  - train_batch_size: 4
80
  - eval_batch_size: 4
81
  - seed: 702
 
86
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
87
  - lr_scheduler_type: linear
88
  - lr_scheduler_warmup_steps: 100
89
+ - num_epochs: 2.0
90
+ - weight_decay: 0.0