UltimoUno commited on
Commit
785485d
1 Parent(s): 1b2ee40

Uploaded checkpoint-15000

Browse files
Files changed (5) hide show
  1. model.safetensors +1 -1
  2. optimizer.pt +1 -1
  3. rng_state.pth +1 -1
  4. scheduler.pt +1 -1
  5. trainer_state.json +3513 -5
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:836ce753b3bf04df5088ee72fbe7eec7e5481ca07a9b87bc1635248f074523ff
3
  size 2692969128
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:383f4fae36db9da6980db3312b06ab3f4ce131795c57a92818ad3b4ba86b07ff
3
  size 2692969128
optimizer.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:593996da216fef5aca8d1f8227825fd76d939317697680dbabb44cd9fe3e290e
3
  size 5386075202
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d370718701079151a6ee36cb10ead87c85c0451e71c56c5203d02a5abf8f5cd0
3
  size 5386075202
rng_state.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:91b6cfff436e44ce1aac34c0deddcf1312e002c1c8fac244f4391c78862bccf7
3
  size 14244
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ee3528bf0ace792176d57cac1ea8e325db1e81a8856e3e8a6e53688b51f9516e
3
  size 14244
scheduler.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7877465fbcaa8e4e37466bd29d76bc3cf901595be41519caaff7b7e37912421b
3
  size 1064
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0cf4bd40b0e3062c56584d972e9743cac19669a0283ba7de8c76540e6d58df00
3
  size 1064
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "best_metric": 1.3570239543914795,
3
- "best_model_checkpoint": "runs/deepseek_20240422-210351/checkpoint-10000",
4
- "epoch": 0.25,
5
  "eval_steps": 5000,
6
- "global_step": 10000,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -7023,6 +7023,3514 @@
7023
  "eval_samples_per_second": 16.869,
7024
  "eval_steps_per_second": 16.869,
7025
  "step": 10000
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7026
  }
7027
  ],
7028
  "logging_steps": 10,
@@ -7030,7 +10538,7 @@
7030
  "num_input_tokens_seen": 0,
7031
  "num_train_epochs": 1,
7032
  "save_steps": 5000,
7033
- "total_flos": 1.5733698330624e+17,
7034
  "train_batch_size": 1,
7035
  "trial_name": null,
7036
  "trial_params": null
 
1
  {
2
+ "best_metric": 1.3379485607147217,
3
+ "best_model_checkpoint": "runs/deepseek_20240422-210351/checkpoint-15000",
4
+ "epoch": 0.375,
5
  "eval_steps": 5000,
6
+ "global_step": 15000,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
7023
  "eval_samples_per_second": 16.869,
7024
  "eval_steps_per_second": 16.869,
7025
  "step": 10000
7026
+ },
7027
+ {
7028
+ "epoch": 0.25,
7029
+ "grad_norm": 20.75,
7030
+ "learning_rate": 1.3552542372881357e-05,
7031
+ "loss": 1.3801,
7032
+ "step": 10010
7033
+ },
7034
+ {
7035
+ "epoch": 0.25,
7036
+ "grad_norm": 12.875,
7037
+ "learning_rate": 1.3545762711864408e-05,
7038
+ "loss": 1.3604,
7039
+ "step": 10020
7040
+ },
7041
+ {
7042
+ "epoch": 0.25,
7043
+ "grad_norm": 10.9375,
7044
+ "learning_rate": 1.3538983050847458e-05,
7045
+ "loss": 1.489,
7046
+ "step": 10030
7047
+ },
7048
+ {
7049
+ "epoch": 0.25,
7050
+ "grad_norm": 10.6875,
7051
+ "learning_rate": 1.353220338983051e-05,
7052
+ "loss": 1.5012,
7053
+ "step": 10040
7054
+ },
7055
+ {
7056
+ "epoch": 0.25,
7057
+ "grad_norm": 21.625,
7058
+ "learning_rate": 1.352542372881356e-05,
7059
+ "loss": 1.4174,
7060
+ "step": 10050
7061
+ },
7062
+ {
7063
+ "epoch": 0.25,
7064
+ "grad_norm": 28.875,
7065
+ "learning_rate": 1.3518644067796612e-05,
7066
+ "loss": 1.2765,
7067
+ "step": 10060
7068
+ },
7069
+ {
7070
+ "epoch": 0.25,
7071
+ "grad_norm": 23.875,
7072
+ "learning_rate": 1.3511864406779662e-05,
7073
+ "loss": 1.4869,
7074
+ "step": 10070
7075
+ },
7076
+ {
7077
+ "epoch": 0.25,
7078
+ "grad_norm": 20.0,
7079
+ "learning_rate": 1.3505084745762713e-05,
7080
+ "loss": 1.2636,
7081
+ "step": 10080
7082
+ },
7083
+ {
7084
+ "epoch": 0.25,
7085
+ "grad_norm": 12.4375,
7086
+ "learning_rate": 1.3498305084745764e-05,
7087
+ "loss": 1.4241,
7088
+ "step": 10090
7089
+ },
7090
+ {
7091
+ "epoch": 0.25,
7092
+ "grad_norm": 25.5,
7093
+ "learning_rate": 1.3491525423728816e-05,
7094
+ "loss": 1.3722,
7095
+ "step": 10100
7096
+ },
7097
+ {
7098
+ "epoch": 0.25,
7099
+ "grad_norm": 21.5,
7100
+ "learning_rate": 1.3484745762711866e-05,
7101
+ "loss": 1.3611,
7102
+ "step": 10110
7103
+ },
7104
+ {
7105
+ "epoch": 0.25,
7106
+ "grad_norm": 14.375,
7107
+ "learning_rate": 1.3477966101694917e-05,
7108
+ "loss": 1.4016,
7109
+ "step": 10120
7110
+ },
7111
+ {
7112
+ "epoch": 0.25,
7113
+ "grad_norm": 11.3125,
7114
+ "learning_rate": 1.3471186440677968e-05,
7115
+ "loss": 1.3755,
7116
+ "step": 10130
7117
+ },
7118
+ {
7119
+ "epoch": 0.25,
7120
+ "grad_norm": 15.125,
7121
+ "learning_rate": 1.346440677966102e-05,
7122
+ "loss": 1.3698,
7123
+ "step": 10140
7124
+ },
7125
+ {
7126
+ "epoch": 0.25,
7127
+ "grad_norm": 23.875,
7128
+ "learning_rate": 1.345762711864407e-05,
7129
+ "loss": 1.3596,
7130
+ "step": 10150
7131
+ },
7132
+ {
7133
+ "epoch": 0.25,
7134
+ "grad_norm": 11.9375,
7135
+ "learning_rate": 1.3450847457627121e-05,
7136
+ "loss": 1.3765,
7137
+ "step": 10160
7138
+ },
7139
+ {
7140
+ "epoch": 0.25,
7141
+ "grad_norm": 33.0,
7142
+ "learning_rate": 1.3444067796610169e-05,
7143
+ "loss": 1.3891,
7144
+ "step": 10170
7145
+ },
7146
+ {
7147
+ "epoch": 0.25,
7148
+ "grad_norm": 17.25,
7149
+ "learning_rate": 1.343728813559322e-05,
7150
+ "loss": 1.3707,
7151
+ "step": 10180
7152
+ },
7153
+ {
7154
+ "epoch": 0.25,
7155
+ "grad_norm": 14.0625,
7156
+ "learning_rate": 1.3430508474576272e-05,
7157
+ "loss": 1.3348,
7158
+ "step": 10190
7159
+ },
7160
+ {
7161
+ "epoch": 0.26,
7162
+ "grad_norm": 30.625,
7163
+ "learning_rate": 1.3423728813559323e-05,
7164
+ "loss": 1.4954,
7165
+ "step": 10200
7166
+ },
7167
+ {
7168
+ "epoch": 0.26,
7169
+ "grad_norm": 7.8125,
7170
+ "learning_rate": 1.3416949152542373e-05,
7171
+ "loss": 1.4154,
7172
+ "step": 10210
7173
+ },
7174
+ {
7175
+ "epoch": 0.26,
7176
+ "grad_norm": 29.25,
7177
+ "learning_rate": 1.3410169491525424e-05,
7178
+ "loss": 1.5622,
7179
+ "step": 10220
7180
+ },
7181
+ {
7182
+ "epoch": 0.26,
7183
+ "grad_norm": 6.34375,
7184
+ "learning_rate": 1.3403389830508476e-05,
7185
+ "loss": 1.4377,
7186
+ "step": 10230
7187
+ },
7188
+ {
7189
+ "epoch": 0.26,
7190
+ "grad_norm": 31.875,
7191
+ "learning_rate": 1.3396610169491527e-05,
7192
+ "loss": 1.3888,
7193
+ "step": 10240
7194
+ },
7195
+ {
7196
+ "epoch": 0.26,
7197
+ "grad_norm": 41.5,
7198
+ "learning_rate": 1.3389830508474577e-05,
7199
+ "loss": 1.4047,
7200
+ "step": 10250
7201
+ },
7202
+ {
7203
+ "epoch": 0.26,
7204
+ "grad_norm": 20.0,
7205
+ "learning_rate": 1.3383050847457628e-05,
7206
+ "loss": 1.3593,
7207
+ "step": 10260
7208
+ },
7209
+ {
7210
+ "epoch": 0.26,
7211
+ "grad_norm": 15.6875,
7212
+ "learning_rate": 1.337627118644068e-05,
7213
+ "loss": 1.4452,
7214
+ "step": 10270
7215
+ },
7216
+ {
7217
+ "epoch": 0.26,
7218
+ "grad_norm": 8.8125,
7219
+ "learning_rate": 1.3369491525423731e-05,
7220
+ "loss": 1.4375,
7221
+ "step": 10280
7222
+ },
7223
+ {
7224
+ "epoch": 0.26,
7225
+ "grad_norm": 55.0,
7226
+ "learning_rate": 1.336271186440678e-05,
7227
+ "loss": 1.2869,
7228
+ "step": 10290
7229
+ },
7230
+ {
7231
+ "epoch": 0.26,
7232
+ "grad_norm": 17.0,
7233
+ "learning_rate": 1.3355932203389832e-05,
7234
+ "loss": 1.3051,
7235
+ "step": 10300
7236
+ },
7237
+ {
7238
+ "epoch": 0.26,
7239
+ "grad_norm": 6.71875,
7240
+ "learning_rate": 1.3349152542372883e-05,
7241
+ "loss": 1.4207,
7242
+ "step": 10310
7243
+ },
7244
+ {
7245
+ "epoch": 0.26,
7246
+ "grad_norm": 27.375,
7247
+ "learning_rate": 1.3342372881355933e-05,
7248
+ "loss": 1.5179,
7249
+ "step": 10320
7250
+ },
7251
+ {
7252
+ "epoch": 0.26,
7253
+ "grad_norm": 22.0,
7254
+ "learning_rate": 1.3335593220338985e-05,
7255
+ "loss": 1.3529,
7256
+ "step": 10330
7257
+ },
7258
+ {
7259
+ "epoch": 0.26,
7260
+ "grad_norm": 27.125,
7261
+ "learning_rate": 1.3328813559322036e-05,
7262
+ "loss": 1.2933,
7263
+ "step": 10340
7264
+ },
7265
+ {
7266
+ "epoch": 0.26,
7267
+ "grad_norm": 15.375,
7268
+ "learning_rate": 1.3322033898305087e-05,
7269
+ "loss": 1.4026,
7270
+ "step": 10350
7271
+ },
7272
+ {
7273
+ "epoch": 0.26,
7274
+ "grad_norm": 20.0,
7275
+ "learning_rate": 1.3315254237288137e-05,
7276
+ "loss": 1.455,
7277
+ "step": 10360
7278
+ },
7279
+ {
7280
+ "epoch": 0.26,
7281
+ "grad_norm": 25.75,
7282
+ "learning_rate": 1.3308474576271187e-05,
7283
+ "loss": 1.4478,
7284
+ "step": 10370
7285
+ },
7286
+ {
7287
+ "epoch": 0.26,
7288
+ "grad_norm": 17.875,
7289
+ "learning_rate": 1.3301694915254238e-05,
7290
+ "loss": 1.4811,
7291
+ "step": 10380
7292
+ },
7293
+ {
7294
+ "epoch": 0.26,
7295
+ "grad_norm": 23.0,
7296
+ "learning_rate": 1.3294915254237288e-05,
7297
+ "loss": 1.2626,
7298
+ "step": 10390
7299
+ },
7300
+ {
7301
+ "epoch": 0.26,
7302
+ "grad_norm": 24.375,
7303
+ "learning_rate": 1.328813559322034e-05,
7304
+ "loss": 1.4401,
7305
+ "step": 10400
7306
+ },
7307
+ {
7308
+ "epoch": 0.26,
7309
+ "grad_norm": 15.5625,
7310
+ "learning_rate": 1.328135593220339e-05,
7311
+ "loss": 1.3587,
7312
+ "step": 10410
7313
+ },
7314
+ {
7315
+ "epoch": 0.26,
7316
+ "grad_norm": 43.75,
7317
+ "learning_rate": 1.3274576271186442e-05,
7318
+ "loss": 1.3697,
7319
+ "step": 10420
7320
+ },
7321
+ {
7322
+ "epoch": 0.26,
7323
+ "grad_norm": 16.375,
7324
+ "learning_rate": 1.3267796610169492e-05,
7325
+ "loss": 1.5165,
7326
+ "step": 10430
7327
+ },
7328
+ {
7329
+ "epoch": 0.26,
7330
+ "grad_norm": 26.25,
7331
+ "learning_rate": 1.3261016949152543e-05,
7332
+ "loss": 1.3134,
7333
+ "step": 10440
7334
+ },
7335
+ {
7336
+ "epoch": 0.26,
7337
+ "grad_norm": 29.375,
7338
+ "learning_rate": 1.3254237288135595e-05,
7339
+ "loss": 1.5107,
7340
+ "step": 10450
7341
+ },
7342
+ {
7343
+ "epoch": 0.26,
7344
+ "grad_norm": 15.0625,
7345
+ "learning_rate": 1.3247457627118644e-05,
7346
+ "loss": 1.4568,
7347
+ "step": 10460
7348
+ },
7349
+ {
7350
+ "epoch": 0.26,
7351
+ "grad_norm": 19.625,
7352
+ "learning_rate": 1.3240677966101696e-05,
7353
+ "loss": 1.3214,
7354
+ "step": 10470
7355
+ },
7356
+ {
7357
+ "epoch": 0.26,
7358
+ "grad_norm": 8.8125,
7359
+ "learning_rate": 1.3233898305084747e-05,
7360
+ "loss": 1.1839,
7361
+ "step": 10480
7362
+ },
7363
+ {
7364
+ "epoch": 0.26,
7365
+ "grad_norm": 12.0,
7366
+ "learning_rate": 1.3227118644067798e-05,
7367
+ "loss": 1.4022,
7368
+ "step": 10490
7369
+ },
7370
+ {
7371
+ "epoch": 0.26,
7372
+ "grad_norm": 14.5,
7373
+ "learning_rate": 1.3220338983050848e-05,
7374
+ "loss": 1.3453,
7375
+ "step": 10500
7376
+ },
7377
+ {
7378
+ "epoch": 0.26,
7379
+ "grad_norm": 12.625,
7380
+ "learning_rate": 1.32135593220339e-05,
7381
+ "loss": 1.6298,
7382
+ "step": 10510
7383
+ },
7384
+ {
7385
+ "epoch": 0.26,
7386
+ "grad_norm": 13.625,
7387
+ "learning_rate": 1.3206779661016951e-05,
7388
+ "loss": 1.4567,
7389
+ "step": 10520
7390
+ },
7391
+ {
7392
+ "epoch": 0.26,
7393
+ "grad_norm": 23.75,
7394
+ "learning_rate": 1.3200000000000002e-05,
7395
+ "loss": 1.5548,
7396
+ "step": 10530
7397
+ },
7398
+ {
7399
+ "epoch": 0.26,
7400
+ "grad_norm": 54.5,
7401
+ "learning_rate": 1.3193220338983052e-05,
7402
+ "loss": 1.4111,
7403
+ "step": 10540
7404
+ },
7405
+ {
7406
+ "epoch": 0.26,
7407
+ "grad_norm": 13.0625,
7408
+ "learning_rate": 1.3186440677966103e-05,
7409
+ "loss": 1.433,
7410
+ "step": 10550
7411
+ },
7412
+ {
7413
+ "epoch": 0.26,
7414
+ "grad_norm": 17.375,
7415
+ "learning_rate": 1.3179661016949155e-05,
7416
+ "loss": 1.5689,
7417
+ "step": 10560
7418
+ },
7419
+ {
7420
+ "epoch": 0.26,
7421
+ "grad_norm": 20.25,
7422
+ "learning_rate": 1.3172881355932206e-05,
7423
+ "loss": 1.3946,
7424
+ "step": 10570
7425
+ },
7426
+ {
7427
+ "epoch": 0.26,
7428
+ "grad_norm": 15.5,
7429
+ "learning_rate": 1.3166101694915254e-05,
7430
+ "loss": 1.4482,
7431
+ "step": 10580
7432
+ },
7433
+ {
7434
+ "epoch": 0.26,
7435
+ "grad_norm": 9.875,
7436
+ "learning_rate": 1.3159322033898306e-05,
7437
+ "loss": 1.4798,
7438
+ "step": 10590
7439
+ },
7440
+ {
7441
+ "epoch": 0.27,
7442
+ "grad_norm": 24.875,
7443
+ "learning_rate": 1.3152542372881355e-05,
7444
+ "loss": 1.4339,
7445
+ "step": 10600
7446
+ },
7447
+ {
7448
+ "epoch": 0.27,
7449
+ "grad_norm": 27.875,
7450
+ "learning_rate": 1.3145762711864407e-05,
7451
+ "loss": 1.3277,
7452
+ "step": 10610
7453
+ },
7454
+ {
7455
+ "epoch": 0.27,
7456
+ "grad_norm": 15.4375,
7457
+ "learning_rate": 1.3138983050847458e-05,
7458
+ "loss": 1.362,
7459
+ "step": 10620
7460
+ },
7461
+ {
7462
+ "epoch": 0.27,
7463
+ "grad_norm": 26.875,
7464
+ "learning_rate": 1.313220338983051e-05,
7465
+ "loss": 1.3184,
7466
+ "step": 10630
7467
+ },
7468
+ {
7469
+ "epoch": 0.27,
7470
+ "grad_norm": 10.25,
7471
+ "learning_rate": 1.312542372881356e-05,
7472
+ "loss": 1.5107,
7473
+ "step": 10640
7474
+ },
7475
+ {
7476
+ "epoch": 0.27,
7477
+ "grad_norm": 15.0625,
7478
+ "learning_rate": 1.311864406779661e-05,
7479
+ "loss": 1.1496,
7480
+ "step": 10650
7481
+ },
7482
+ {
7483
+ "epoch": 0.27,
7484
+ "grad_norm": 14.5,
7485
+ "learning_rate": 1.3111864406779662e-05,
7486
+ "loss": 1.4018,
7487
+ "step": 10660
7488
+ },
7489
+ {
7490
+ "epoch": 0.27,
7491
+ "grad_norm": 38.25,
7492
+ "learning_rate": 1.3105084745762714e-05,
7493
+ "loss": 1.5046,
7494
+ "step": 10670
7495
+ },
7496
+ {
7497
+ "epoch": 0.27,
7498
+ "grad_norm": 16.25,
7499
+ "learning_rate": 1.3098305084745763e-05,
7500
+ "loss": 1.3867,
7501
+ "step": 10680
7502
+ },
7503
+ {
7504
+ "epoch": 0.27,
7505
+ "grad_norm": 9.125,
7506
+ "learning_rate": 1.3091525423728815e-05,
7507
+ "loss": 1.3284,
7508
+ "step": 10690
7509
+ },
7510
+ {
7511
+ "epoch": 0.27,
7512
+ "grad_norm": 27.75,
7513
+ "learning_rate": 1.3084745762711866e-05,
7514
+ "loss": 1.4511,
7515
+ "step": 10700
7516
+ },
7517
+ {
7518
+ "epoch": 0.27,
7519
+ "grad_norm": 11.0625,
7520
+ "learning_rate": 1.3077966101694917e-05,
7521
+ "loss": 1.4621,
7522
+ "step": 10710
7523
+ },
7524
+ {
7525
+ "epoch": 0.27,
7526
+ "grad_norm": 17.875,
7527
+ "learning_rate": 1.3071186440677967e-05,
7528
+ "loss": 1.3628,
7529
+ "step": 10720
7530
+ },
7531
+ {
7532
+ "epoch": 0.27,
7533
+ "grad_norm": 19.25,
7534
+ "learning_rate": 1.3064406779661019e-05,
7535
+ "loss": 1.3811,
7536
+ "step": 10730
7537
+ },
7538
+ {
7539
+ "epoch": 0.27,
7540
+ "grad_norm": 17.125,
7541
+ "learning_rate": 1.305762711864407e-05,
7542
+ "loss": 1.283,
7543
+ "step": 10740
7544
+ },
7545
+ {
7546
+ "epoch": 0.27,
7547
+ "grad_norm": 7.78125,
7548
+ "learning_rate": 1.305084745762712e-05,
7549
+ "loss": 1.3398,
7550
+ "step": 10750
7551
+ },
7552
+ {
7553
+ "epoch": 0.27,
7554
+ "grad_norm": 43.5,
7555
+ "learning_rate": 1.3044067796610171e-05,
7556
+ "loss": 1.3942,
7557
+ "step": 10760
7558
+ },
7559
+ {
7560
+ "epoch": 0.27,
7561
+ "grad_norm": 14.75,
7562
+ "learning_rate": 1.3037288135593222e-05,
7563
+ "loss": 1.407,
7564
+ "step": 10770
7565
+ },
7566
+ {
7567
+ "epoch": 0.27,
7568
+ "grad_norm": 19.625,
7569
+ "learning_rate": 1.3030508474576274e-05,
7570
+ "loss": 1.3883,
7571
+ "step": 10780
7572
+ },
7573
+ {
7574
+ "epoch": 0.27,
7575
+ "grad_norm": 22.125,
7576
+ "learning_rate": 1.3023728813559322e-05,
7577
+ "loss": 1.4614,
7578
+ "step": 10790
7579
+ },
7580
+ {
7581
+ "epoch": 0.27,
7582
+ "grad_norm": 9.75,
7583
+ "learning_rate": 1.3016949152542373e-05,
7584
+ "loss": 1.4048,
7585
+ "step": 10800
7586
+ },
7587
+ {
7588
+ "epoch": 0.27,
7589
+ "grad_norm": 16.625,
7590
+ "learning_rate": 1.3010169491525425e-05,
7591
+ "loss": 1.5016,
7592
+ "step": 10810
7593
+ },
7594
+ {
7595
+ "epoch": 0.27,
7596
+ "grad_norm": 16.75,
7597
+ "learning_rate": 1.3003389830508474e-05,
7598
+ "loss": 1.562,
7599
+ "step": 10820
7600
+ },
7601
+ {
7602
+ "epoch": 0.27,
7603
+ "grad_norm": 39.0,
7604
+ "learning_rate": 1.2996610169491526e-05,
7605
+ "loss": 1.3651,
7606
+ "step": 10830
7607
+ },
7608
+ {
7609
+ "epoch": 0.27,
7610
+ "grad_norm": 18.75,
7611
+ "learning_rate": 1.2989830508474577e-05,
7612
+ "loss": 1.4262,
7613
+ "step": 10840
7614
+ },
7615
+ {
7616
+ "epoch": 0.27,
7617
+ "grad_norm": 9.75,
7618
+ "learning_rate": 1.2983050847457629e-05,
7619
+ "loss": 1.3998,
7620
+ "step": 10850
7621
+ },
7622
+ {
7623
+ "epoch": 0.27,
7624
+ "grad_norm": 19.625,
7625
+ "learning_rate": 1.2976271186440678e-05,
7626
+ "loss": 1.3164,
7627
+ "step": 10860
7628
+ },
7629
+ {
7630
+ "epoch": 0.27,
7631
+ "grad_norm": 18.125,
7632
+ "learning_rate": 1.296949152542373e-05,
7633
+ "loss": 1.2888,
7634
+ "step": 10870
7635
+ },
7636
+ {
7637
+ "epoch": 0.27,
7638
+ "grad_norm": 8.5625,
7639
+ "learning_rate": 1.2962711864406781e-05,
7640
+ "loss": 1.3567,
7641
+ "step": 10880
7642
+ },
7643
+ {
7644
+ "epoch": 0.27,
7645
+ "grad_norm": 18.75,
7646
+ "learning_rate": 1.295593220338983e-05,
7647
+ "loss": 1.5165,
7648
+ "step": 10890
7649
+ },
7650
+ {
7651
+ "epoch": 0.27,
7652
+ "grad_norm": 11.125,
7653
+ "learning_rate": 1.2949152542372882e-05,
7654
+ "loss": 1.4366,
7655
+ "step": 10900
7656
+ },
7657
+ {
7658
+ "epoch": 0.27,
7659
+ "grad_norm": 14.875,
7660
+ "learning_rate": 1.2942372881355934e-05,
7661
+ "loss": 1.3693,
7662
+ "step": 10910
7663
+ },
7664
+ {
7665
+ "epoch": 0.27,
7666
+ "grad_norm": 12.625,
7667
+ "learning_rate": 1.2935593220338985e-05,
7668
+ "loss": 1.4838,
7669
+ "step": 10920
7670
+ },
7671
+ {
7672
+ "epoch": 0.27,
7673
+ "grad_norm": 11.375,
7674
+ "learning_rate": 1.2928813559322035e-05,
7675
+ "loss": 1.2853,
7676
+ "step": 10930
7677
+ },
7678
+ {
7679
+ "epoch": 0.27,
7680
+ "grad_norm": 17.375,
7681
+ "learning_rate": 1.2922033898305086e-05,
7682
+ "loss": 1.3973,
7683
+ "step": 10940
7684
+ },
7685
+ {
7686
+ "epoch": 0.27,
7687
+ "grad_norm": 26.75,
7688
+ "learning_rate": 1.2915254237288137e-05,
7689
+ "loss": 1.3605,
7690
+ "step": 10950
7691
+ },
7692
+ {
7693
+ "epoch": 0.27,
7694
+ "grad_norm": 23.5,
7695
+ "learning_rate": 1.2908474576271189e-05,
7696
+ "loss": 1.5397,
7697
+ "step": 10960
7698
+ },
7699
+ {
7700
+ "epoch": 0.27,
7701
+ "grad_norm": 34.5,
7702
+ "learning_rate": 1.2901694915254239e-05,
7703
+ "loss": 1.2128,
7704
+ "step": 10970
7705
+ },
7706
+ {
7707
+ "epoch": 0.27,
7708
+ "grad_norm": 15.0625,
7709
+ "learning_rate": 1.289491525423729e-05,
7710
+ "loss": 1.4024,
7711
+ "step": 10980
7712
+ },
7713
+ {
7714
+ "epoch": 0.27,
7715
+ "grad_norm": 13.75,
7716
+ "learning_rate": 1.2888135593220341e-05,
7717
+ "loss": 1.448,
7718
+ "step": 10990
7719
+ },
7720
+ {
7721
+ "epoch": 0.28,
7722
+ "grad_norm": 10.5,
7723
+ "learning_rate": 1.288135593220339e-05,
7724
+ "loss": 1.564,
7725
+ "step": 11000
7726
+ },
7727
+ {
7728
+ "epoch": 0.28,
7729
+ "grad_norm": 22.0,
7730
+ "learning_rate": 1.287457627118644e-05,
7731
+ "loss": 1.4604,
7732
+ "step": 11010
7733
+ },
7734
+ {
7735
+ "epoch": 0.28,
7736
+ "grad_norm": 33.25,
7737
+ "learning_rate": 1.2867796610169492e-05,
7738
+ "loss": 1.4426,
7739
+ "step": 11020
7740
+ },
7741
+ {
7742
+ "epoch": 0.28,
7743
+ "grad_norm": 18.25,
7744
+ "learning_rate": 1.2861016949152542e-05,
7745
+ "loss": 1.4848,
7746
+ "step": 11030
7747
+ },
7748
+ {
7749
+ "epoch": 0.28,
7750
+ "grad_norm": 28.75,
7751
+ "learning_rate": 1.2854237288135593e-05,
7752
+ "loss": 1.5427,
7753
+ "step": 11040
7754
+ },
7755
+ {
7756
+ "epoch": 0.28,
7757
+ "grad_norm": 28.25,
7758
+ "learning_rate": 1.2847457627118645e-05,
7759
+ "loss": 1.4344,
7760
+ "step": 11050
7761
+ },
7762
+ {
7763
+ "epoch": 0.28,
7764
+ "grad_norm": 12.5625,
7765
+ "learning_rate": 1.2840677966101696e-05,
7766
+ "loss": 1.3914,
7767
+ "step": 11060
7768
+ },
7769
+ {
7770
+ "epoch": 0.28,
7771
+ "grad_norm": 9.875,
7772
+ "learning_rate": 1.2833898305084746e-05,
7773
+ "loss": 1.3298,
7774
+ "step": 11070
7775
+ },
7776
+ {
7777
+ "epoch": 0.28,
7778
+ "grad_norm": 9.75,
7779
+ "learning_rate": 1.2827118644067797e-05,
7780
+ "loss": 1.5821,
7781
+ "step": 11080
7782
+ },
7783
+ {
7784
+ "epoch": 0.28,
7785
+ "grad_norm": 7.8125,
7786
+ "learning_rate": 1.2820338983050849e-05,
7787
+ "loss": 1.3951,
7788
+ "step": 11090
7789
+ },
7790
+ {
7791
+ "epoch": 0.28,
7792
+ "grad_norm": 9.4375,
7793
+ "learning_rate": 1.28135593220339e-05,
7794
+ "loss": 1.5711,
7795
+ "step": 11100
7796
+ },
7797
+ {
7798
+ "epoch": 0.28,
7799
+ "grad_norm": 16.5,
7800
+ "learning_rate": 1.280677966101695e-05,
7801
+ "loss": 1.3264,
7802
+ "step": 11110
7803
+ },
7804
+ {
7805
+ "epoch": 0.28,
7806
+ "grad_norm": 33.25,
7807
+ "learning_rate": 1.2800000000000001e-05,
7808
+ "loss": 1.2913,
7809
+ "step": 11120
7810
+ },
7811
+ {
7812
+ "epoch": 0.28,
7813
+ "grad_norm": 23.875,
7814
+ "learning_rate": 1.2793220338983053e-05,
7815
+ "loss": 1.4656,
7816
+ "step": 11130
7817
+ },
7818
+ {
7819
+ "epoch": 0.28,
7820
+ "grad_norm": 42.75,
7821
+ "learning_rate": 1.2786440677966104e-05,
7822
+ "loss": 1.3079,
7823
+ "step": 11140
7824
+ },
7825
+ {
7826
+ "epoch": 0.28,
7827
+ "grad_norm": 21.875,
7828
+ "learning_rate": 1.2779661016949154e-05,
7829
+ "loss": 1.3085,
7830
+ "step": 11150
7831
+ },
7832
+ {
7833
+ "epoch": 0.28,
7834
+ "grad_norm": 13.75,
7835
+ "learning_rate": 1.2772881355932205e-05,
7836
+ "loss": 1.2513,
7837
+ "step": 11160
7838
+ },
7839
+ {
7840
+ "epoch": 0.28,
7841
+ "grad_norm": 25.25,
7842
+ "learning_rate": 1.2766101694915256e-05,
7843
+ "loss": 1.2082,
7844
+ "step": 11170
7845
+ },
7846
+ {
7847
+ "epoch": 0.28,
7848
+ "grad_norm": 13.0625,
7849
+ "learning_rate": 1.2759322033898308e-05,
7850
+ "loss": 1.466,
7851
+ "step": 11180
7852
+ },
7853
+ {
7854
+ "epoch": 0.28,
7855
+ "grad_norm": 20.75,
7856
+ "learning_rate": 1.2752542372881358e-05,
7857
+ "loss": 1.2071,
7858
+ "step": 11190
7859
+ },
7860
+ {
7861
+ "epoch": 0.28,
7862
+ "grad_norm": 9.1875,
7863
+ "learning_rate": 1.2745762711864407e-05,
7864
+ "loss": 1.3974,
7865
+ "step": 11200
7866
+ },
7867
+ {
7868
+ "epoch": 0.28,
7869
+ "grad_norm": 33.0,
7870
+ "learning_rate": 1.2738983050847457e-05,
7871
+ "loss": 1.3897,
7872
+ "step": 11210
7873
+ },
7874
+ {
7875
+ "epoch": 0.28,
7876
+ "grad_norm": 25.625,
7877
+ "learning_rate": 1.2732203389830508e-05,
7878
+ "loss": 1.2995,
7879
+ "step": 11220
7880
+ },
7881
+ {
7882
+ "epoch": 0.28,
7883
+ "grad_norm": 24.0,
7884
+ "learning_rate": 1.272542372881356e-05,
7885
+ "loss": 1.5626,
7886
+ "step": 11230
7887
+ },
7888
+ {
7889
+ "epoch": 0.28,
7890
+ "grad_norm": 33.25,
7891
+ "learning_rate": 1.2718644067796611e-05,
7892
+ "loss": 1.4729,
7893
+ "step": 11240
7894
+ },
7895
+ {
7896
+ "epoch": 0.28,
7897
+ "grad_norm": 34.75,
7898
+ "learning_rate": 1.2711864406779661e-05,
7899
+ "loss": 1.3542,
7900
+ "step": 11250
7901
+ },
7902
+ {
7903
+ "epoch": 0.28,
7904
+ "grad_norm": 12.875,
7905
+ "learning_rate": 1.2705084745762712e-05,
7906
+ "loss": 1.3378,
7907
+ "step": 11260
7908
+ },
7909
+ {
7910
+ "epoch": 0.28,
7911
+ "grad_norm": 46.25,
7912
+ "learning_rate": 1.2698305084745764e-05,
7913
+ "loss": 1.6005,
7914
+ "step": 11270
7915
+ },
7916
+ {
7917
+ "epoch": 0.28,
7918
+ "grad_norm": 25.75,
7919
+ "learning_rate": 1.2691525423728815e-05,
7920
+ "loss": 1.3225,
7921
+ "step": 11280
7922
+ },
7923
+ {
7924
+ "epoch": 0.28,
7925
+ "grad_norm": 34.0,
7926
+ "learning_rate": 1.2684745762711865e-05,
7927
+ "loss": 1.251,
7928
+ "step": 11290
7929
+ },
7930
+ {
7931
+ "epoch": 0.28,
7932
+ "grad_norm": 10.625,
7933
+ "learning_rate": 1.2677966101694916e-05,
7934
+ "loss": 1.3167,
7935
+ "step": 11300
7936
+ },
7937
+ {
7938
+ "epoch": 0.28,
7939
+ "grad_norm": 15.375,
7940
+ "learning_rate": 1.2671186440677968e-05,
7941
+ "loss": 1.2932,
7942
+ "step": 11310
7943
+ },
7944
+ {
7945
+ "epoch": 0.28,
7946
+ "grad_norm": 16.375,
7947
+ "learning_rate": 1.2664406779661019e-05,
7948
+ "loss": 1.3602,
7949
+ "step": 11320
7950
+ },
7951
+ {
7952
+ "epoch": 0.28,
7953
+ "grad_norm": 26.5,
7954
+ "learning_rate": 1.2657627118644069e-05,
7955
+ "loss": 1.4041,
7956
+ "step": 11330
7957
+ },
7958
+ {
7959
+ "epoch": 0.28,
7960
+ "grad_norm": 61.5,
7961
+ "learning_rate": 1.265084745762712e-05,
7962
+ "loss": 1.4216,
7963
+ "step": 11340
7964
+ },
7965
+ {
7966
+ "epoch": 0.28,
7967
+ "grad_norm": 21.0,
7968
+ "learning_rate": 1.2644067796610171e-05,
7969
+ "loss": 1.3234,
7970
+ "step": 11350
7971
+ },
7972
+ {
7973
+ "epoch": 0.28,
7974
+ "grad_norm": 10.25,
7975
+ "learning_rate": 1.2637288135593221e-05,
7976
+ "loss": 1.3886,
7977
+ "step": 11360
7978
+ },
7979
+ {
7980
+ "epoch": 0.28,
7981
+ "grad_norm": 28.0,
7982
+ "learning_rate": 1.2630508474576273e-05,
7983
+ "loss": 1.4435,
7984
+ "step": 11370
7985
+ },
7986
+ {
7987
+ "epoch": 0.28,
7988
+ "grad_norm": 11.375,
7989
+ "learning_rate": 1.2623728813559324e-05,
7990
+ "loss": 1.429,
7991
+ "step": 11380
7992
+ },
7993
+ {
7994
+ "epoch": 0.28,
7995
+ "grad_norm": 20.25,
7996
+ "learning_rate": 1.2616949152542375e-05,
7997
+ "loss": 1.3044,
7998
+ "step": 11390
7999
+ },
8000
+ {
8001
+ "epoch": 0.28,
8002
+ "grad_norm": 35.25,
8003
+ "learning_rate": 1.2610169491525425e-05,
8004
+ "loss": 1.1607,
8005
+ "step": 11400
8006
+ },
8007
+ {
8008
+ "epoch": 0.29,
8009
+ "grad_norm": 18.75,
8010
+ "learning_rate": 1.2603389830508475e-05,
8011
+ "loss": 1.5203,
8012
+ "step": 11410
8013
+ },
8014
+ {
8015
+ "epoch": 0.29,
8016
+ "grad_norm": 19.0,
8017
+ "learning_rate": 1.2596610169491526e-05,
8018
+ "loss": 1.3805,
8019
+ "step": 11420
8020
+ },
8021
+ {
8022
+ "epoch": 0.29,
8023
+ "grad_norm": 27.625,
8024
+ "learning_rate": 1.2589830508474576e-05,
8025
+ "loss": 1.4146,
8026
+ "step": 11430
8027
+ },
8028
+ {
8029
+ "epoch": 0.29,
8030
+ "grad_norm": 7.6875,
8031
+ "learning_rate": 1.2583050847457627e-05,
8032
+ "loss": 1.3669,
8033
+ "step": 11440
8034
+ },
8035
+ {
8036
+ "epoch": 0.29,
8037
+ "grad_norm": 35.75,
8038
+ "learning_rate": 1.2576271186440679e-05,
8039
+ "loss": 1.3515,
8040
+ "step": 11450
8041
+ },
8042
+ {
8043
+ "epoch": 0.29,
8044
+ "grad_norm": 12.6875,
8045
+ "learning_rate": 1.256949152542373e-05,
8046
+ "loss": 1.2908,
8047
+ "step": 11460
8048
+ },
8049
+ {
8050
+ "epoch": 0.29,
8051
+ "grad_norm": 13.625,
8052
+ "learning_rate": 1.256271186440678e-05,
8053
+ "loss": 1.3585,
8054
+ "step": 11470
8055
+ },
8056
+ {
8057
+ "epoch": 0.29,
8058
+ "grad_norm": 19.125,
8059
+ "learning_rate": 1.2555932203389831e-05,
8060
+ "loss": 1.3873,
8061
+ "step": 11480
8062
+ },
8063
+ {
8064
+ "epoch": 0.29,
8065
+ "grad_norm": 18.5,
8066
+ "learning_rate": 1.2549152542372883e-05,
8067
+ "loss": 1.3207,
8068
+ "step": 11490
8069
+ },
8070
+ {
8071
+ "epoch": 0.29,
8072
+ "grad_norm": 21.125,
8073
+ "learning_rate": 1.2542372881355932e-05,
8074
+ "loss": 1.4772,
8075
+ "step": 11500
8076
+ },
8077
+ {
8078
+ "epoch": 0.29,
8079
+ "grad_norm": 7.4375,
8080
+ "learning_rate": 1.2535593220338984e-05,
8081
+ "loss": 1.496,
8082
+ "step": 11510
8083
+ },
8084
+ {
8085
+ "epoch": 0.29,
8086
+ "grad_norm": 15.6875,
8087
+ "learning_rate": 1.2528813559322035e-05,
8088
+ "loss": 1.2367,
8089
+ "step": 11520
8090
+ },
8091
+ {
8092
+ "epoch": 0.29,
8093
+ "grad_norm": 20.375,
8094
+ "learning_rate": 1.2522033898305087e-05,
8095
+ "loss": 1.4323,
8096
+ "step": 11530
8097
+ },
8098
+ {
8099
+ "epoch": 0.29,
8100
+ "grad_norm": 15.3125,
8101
+ "learning_rate": 1.2515254237288136e-05,
8102
+ "loss": 1.2777,
8103
+ "step": 11540
8104
+ },
8105
+ {
8106
+ "epoch": 0.29,
8107
+ "grad_norm": 17.25,
8108
+ "learning_rate": 1.2508474576271188e-05,
8109
+ "loss": 1.2263,
8110
+ "step": 11550
8111
+ },
8112
+ {
8113
+ "epoch": 0.29,
8114
+ "grad_norm": 10.1875,
8115
+ "learning_rate": 1.2501694915254239e-05,
8116
+ "loss": 1.5418,
8117
+ "step": 11560
8118
+ },
8119
+ {
8120
+ "epoch": 0.29,
8121
+ "grad_norm": 22.875,
8122
+ "learning_rate": 1.249491525423729e-05,
8123
+ "loss": 1.4742,
8124
+ "step": 11570
8125
+ },
8126
+ {
8127
+ "epoch": 0.29,
8128
+ "grad_norm": 37.5,
8129
+ "learning_rate": 1.248813559322034e-05,
8130
+ "loss": 1.38,
8131
+ "step": 11580
8132
+ },
8133
+ {
8134
+ "epoch": 0.29,
8135
+ "grad_norm": 7.15625,
8136
+ "learning_rate": 1.2481355932203392e-05,
8137
+ "loss": 1.2842,
8138
+ "step": 11590
8139
+ },
8140
+ {
8141
+ "epoch": 0.29,
8142
+ "grad_norm": 34.5,
8143
+ "learning_rate": 1.2474576271186443e-05,
8144
+ "loss": 1.2457,
8145
+ "step": 11600
8146
+ },
8147
+ {
8148
+ "epoch": 0.29,
8149
+ "grad_norm": 13.375,
8150
+ "learning_rate": 1.2467796610169494e-05,
8151
+ "loss": 1.3445,
8152
+ "step": 11610
8153
+ },
8154
+ {
8155
+ "epoch": 0.29,
8156
+ "grad_norm": 13.875,
8157
+ "learning_rate": 1.2461016949152542e-05,
8158
+ "loss": 1.3287,
8159
+ "step": 11620
8160
+ },
8161
+ {
8162
+ "epoch": 0.29,
8163
+ "grad_norm": 16.625,
8164
+ "learning_rate": 1.2454237288135594e-05,
8165
+ "loss": 1.4088,
8166
+ "step": 11630
8167
+ },
8168
+ {
8169
+ "epoch": 0.29,
8170
+ "grad_norm": 17.375,
8171
+ "learning_rate": 1.2447457627118643e-05,
8172
+ "loss": 1.2976,
8173
+ "step": 11640
8174
+ },
8175
+ {
8176
+ "epoch": 0.29,
8177
+ "grad_norm": 29.375,
8178
+ "learning_rate": 1.2440677966101695e-05,
8179
+ "loss": 1.257,
8180
+ "step": 11650
8181
+ },
8182
+ {
8183
+ "epoch": 0.29,
8184
+ "grad_norm": 15.4375,
8185
+ "learning_rate": 1.2433898305084746e-05,
8186
+ "loss": 1.4099,
8187
+ "step": 11660
8188
+ },
8189
+ {
8190
+ "epoch": 0.29,
8191
+ "grad_norm": 19.5,
8192
+ "learning_rate": 1.2427118644067798e-05,
8193
+ "loss": 1.3072,
8194
+ "step": 11670
8195
+ },
8196
+ {
8197
+ "epoch": 0.29,
8198
+ "grad_norm": 18.375,
8199
+ "learning_rate": 1.2420338983050847e-05,
8200
+ "loss": 1.3438,
8201
+ "step": 11680
8202
+ },
8203
+ {
8204
+ "epoch": 0.29,
8205
+ "grad_norm": 12.125,
8206
+ "learning_rate": 1.2413559322033899e-05,
8207
+ "loss": 1.4044,
8208
+ "step": 11690
8209
+ },
8210
+ {
8211
+ "epoch": 0.29,
8212
+ "grad_norm": 20.25,
8213
+ "learning_rate": 1.240677966101695e-05,
8214
+ "loss": 1.3873,
8215
+ "step": 11700
8216
+ },
8217
+ {
8218
+ "epoch": 0.29,
8219
+ "grad_norm": 20.25,
8220
+ "learning_rate": 1.2400000000000002e-05,
8221
+ "loss": 1.3976,
8222
+ "step": 11710
8223
+ },
8224
+ {
8225
+ "epoch": 0.29,
8226
+ "grad_norm": 25.125,
8227
+ "learning_rate": 1.2393220338983051e-05,
8228
+ "loss": 1.5033,
8229
+ "step": 11720
8230
+ },
8231
+ {
8232
+ "epoch": 0.29,
8233
+ "grad_norm": 37.0,
8234
+ "learning_rate": 1.2386440677966103e-05,
8235
+ "loss": 1.2516,
8236
+ "step": 11730
8237
+ },
8238
+ {
8239
+ "epoch": 0.29,
8240
+ "grad_norm": 18.625,
8241
+ "learning_rate": 1.2379661016949154e-05,
8242
+ "loss": 1.2863,
8243
+ "step": 11740
8244
+ },
8245
+ {
8246
+ "epoch": 0.29,
8247
+ "grad_norm": 29.5,
8248
+ "learning_rate": 1.2372881355932205e-05,
8249
+ "loss": 1.4199,
8250
+ "step": 11750
8251
+ },
8252
+ {
8253
+ "epoch": 0.29,
8254
+ "grad_norm": 14.8125,
8255
+ "learning_rate": 1.2366101694915255e-05,
8256
+ "loss": 1.5511,
8257
+ "step": 11760
8258
+ },
8259
+ {
8260
+ "epoch": 0.29,
8261
+ "grad_norm": 31.125,
8262
+ "learning_rate": 1.2359322033898307e-05,
8263
+ "loss": 1.5902,
8264
+ "step": 11770
8265
+ },
8266
+ {
8267
+ "epoch": 0.29,
8268
+ "grad_norm": 9.125,
8269
+ "learning_rate": 1.2352542372881358e-05,
8270
+ "loss": 1.5406,
8271
+ "step": 11780
8272
+ },
8273
+ {
8274
+ "epoch": 0.29,
8275
+ "grad_norm": 15.0625,
8276
+ "learning_rate": 1.234576271186441e-05,
8277
+ "loss": 1.4011,
8278
+ "step": 11790
8279
+ },
8280
+ {
8281
+ "epoch": 0.29,
8282
+ "grad_norm": 15.0625,
8283
+ "learning_rate": 1.2338983050847459e-05,
8284
+ "loss": 1.2501,
8285
+ "step": 11800
8286
+ },
8287
+ {
8288
+ "epoch": 0.3,
8289
+ "grad_norm": 15.9375,
8290
+ "learning_rate": 1.233220338983051e-05,
8291
+ "loss": 1.3155,
8292
+ "step": 11810
8293
+ },
8294
+ {
8295
+ "epoch": 0.3,
8296
+ "grad_norm": 7.28125,
8297
+ "learning_rate": 1.2325423728813562e-05,
8298
+ "loss": 1.3896,
8299
+ "step": 11820
8300
+ },
8301
+ {
8302
+ "epoch": 0.3,
8303
+ "grad_norm": 29.25,
8304
+ "learning_rate": 1.231864406779661e-05,
8305
+ "loss": 1.6491,
8306
+ "step": 11830
8307
+ },
8308
+ {
8309
+ "epoch": 0.3,
8310
+ "grad_norm": 15.875,
8311
+ "learning_rate": 1.2311864406779661e-05,
8312
+ "loss": 1.3143,
8313
+ "step": 11840
8314
+ },
8315
+ {
8316
+ "epoch": 0.3,
8317
+ "grad_norm": 26.75,
8318
+ "learning_rate": 1.2305084745762713e-05,
8319
+ "loss": 1.3963,
8320
+ "step": 11850
8321
+ },
8322
+ {
8323
+ "epoch": 0.3,
8324
+ "grad_norm": 38.5,
8325
+ "learning_rate": 1.2298305084745762e-05,
8326
+ "loss": 1.3003,
8327
+ "step": 11860
8328
+ },
8329
+ {
8330
+ "epoch": 0.3,
8331
+ "grad_norm": 15.0,
8332
+ "learning_rate": 1.2291525423728814e-05,
8333
+ "loss": 1.51,
8334
+ "step": 11870
8335
+ },
8336
+ {
8337
+ "epoch": 0.3,
8338
+ "grad_norm": 23.5,
8339
+ "learning_rate": 1.2284745762711865e-05,
8340
+ "loss": 1.2961,
8341
+ "step": 11880
8342
+ },
8343
+ {
8344
+ "epoch": 0.3,
8345
+ "grad_norm": 12.5,
8346
+ "learning_rate": 1.2277966101694917e-05,
8347
+ "loss": 1.2989,
8348
+ "step": 11890
8349
+ },
8350
+ {
8351
+ "epoch": 0.3,
8352
+ "grad_norm": 32.0,
8353
+ "learning_rate": 1.2271186440677966e-05,
8354
+ "loss": 1.3863,
8355
+ "step": 11900
8356
+ },
8357
+ {
8358
+ "epoch": 0.3,
8359
+ "grad_norm": 22.625,
8360
+ "learning_rate": 1.2264406779661018e-05,
8361
+ "loss": 1.6525,
8362
+ "step": 11910
8363
+ },
8364
+ {
8365
+ "epoch": 0.3,
8366
+ "grad_norm": 64.0,
8367
+ "learning_rate": 1.2257627118644069e-05,
8368
+ "loss": 1.1806,
8369
+ "step": 11920
8370
+ },
8371
+ {
8372
+ "epoch": 0.3,
8373
+ "grad_norm": 12.0625,
8374
+ "learning_rate": 1.225084745762712e-05,
8375
+ "loss": 1.4508,
8376
+ "step": 11930
8377
+ },
8378
+ {
8379
+ "epoch": 0.3,
8380
+ "grad_norm": 29.125,
8381
+ "learning_rate": 1.224406779661017e-05,
8382
+ "loss": 1.5203,
8383
+ "step": 11940
8384
+ },
8385
+ {
8386
+ "epoch": 0.3,
8387
+ "grad_norm": 8.6875,
8388
+ "learning_rate": 1.2237288135593222e-05,
8389
+ "loss": 1.5215,
8390
+ "step": 11950
8391
+ },
8392
+ {
8393
+ "epoch": 0.3,
8394
+ "grad_norm": 7.40625,
8395
+ "learning_rate": 1.2230508474576273e-05,
8396
+ "loss": 1.2804,
8397
+ "step": 11960
8398
+ },
8399
+ {
8400
+ "epoch": 0.3,
8401
+ "grad_norm": 52.75,
8402
+ "learning_rate": 1.2223728813559323e-05,
8403
+ "loss": 1.3751,
8404
+ "step": 11970
8405
+ },
8406
+ {
8407
+ "epoch": 0.3,
8408
+ "grad_norm": 13.0,
8409
+ "learning_rate": 1.2216949152542374e-05,
8410
+ "loss": 1.2444,
8411
+ "step": 11980
8412
+ },
8413
+ {
8414
+ "epoch": 0.3,
8415
+ "grad_norm": 39.5,
8416
+ "learning_rate": 1.2210169491525426e-05,
8417
+ "loss": 1.2927,
8418
+ "step": 11990
8419
+ },
8420
+ {
8421
+ "epoch": 0.3,
8422
+ "grad_norm": 12.5625,
8423
+ "learning_rate": 1.2203389830508477e-05,
8424
+ "loss": 1.4383,
8425
+ "step": 12000
8426
+ },
8427
+ {
8428
+ "epoch": 0.3,
8429
+ "grad_norm": 41.5,
8430
+ "learning_rate": 1.2196610169491527e-05,
8431
+ "loss": 1.4106,
8432
+ "step": 12010
8433
+ },
8434
+ {
8435
+ "epoch": 0.3,
8436
+ "grad_norm": 16.5,
8437
+ "learning_rate": 1.2189830508474578e-05,
8438
+ "loss": 1.3155,
8439
+ "step": 12020
8440
+ },
8441
+ {
8442
+ "epoch": 0.3,
8443
+ "grad_norm": 13.9375,
8444
+ "learning_rate": 1.2183050847457628e-05,
8445
+ "loss": 1.3191,
8446
+ "step": 12030
8447
+ },
8448
+ {
8449
+ "epoch": 0.3,
8450
+ "grad_norm": 13.9375,
8451
+ "learning_rate": 1.2176271186440677e-05,
8452
+ "loss": 1.393,
8453
+ "step": 12040
8454
+ },
8455
+ {
8456
+ "epoch": 0.3,
8457
+ "grad_norm": 46.25,
8458
+ "learning_rate": 1.2169491525423729e-05,
8459
+ "loss": 1.3792,
8460
+ "step": 12050
8461
+ },
8462
+ {
8463
+ "epoch": 0.3,
8464
+ "grad_norm": 60.75,
8465
+ "learning_rate": 1.216271186440678e-05,
8466
+ "loss": 1.3197,
8467
+ "step": 12060
8468
+ },
8469
+ {
8470
+ "epoch": 0.3,
8471
+ "grad_norm": 27.875,
8472
+ "learning_rate": 1.2155932203389832e-05,
8473
+ "loss": 1.3858,
8474
+ "step": 12070
8475
+ },
8476
+ {
8477
+ "epoch": 0.3,
8478
+ "grad_norm": 42.25,
8479
+ "learning_rate": 1.2149152542372881e-05,
8480
+ "loss": 1.4581,
8481
+ "step": 12080
8482
+ },
8483
+ {
8484
+ "epoch": 0.3,
8485
+ "grad_norm": 16.75,
8486
+ "learning_rate": 1.2142372881355933e-05,
8487
+ "loss": 1.4428,
8488
+ "step": 12090
8489
+ },
8490
+ {
8491
+ "epoch": 0.3,
8492
+ "grad_norm": 40.75,
8493
+ "learning_rate": 1.2135593220338984e-05,
8494
+ "loss": 1.2876,
8495
+ "step": 12100
8496
+ },
8497
+ {
8498
+ "epoch": 0.3,
8499
+ "grad_norm": 34.75,
8500
+ "learning_rate": 1.2128813559322034e-05,
8501
+ "loss": 1.295,
8502
+ "step": 12110
8503
+ },
8504
+ {
8505
+ "epoch": 0.3,
8506
+ "grad_norm": 31.625,
8507
+ "learning_rate": 1.2122033898305085e-05,
8508
+ "loss": 1.4799,
8509
+ "step": 12120
8510
+ },
8511
+ {
8512
+ "epoch": 0.3,
8513
+ "grad_norm": 23.5,
8514
+ "learning_rate": 1.2115254237288137e-05,
8515
+ "loss": 1.3044,
8516
+ "step": 12130
8517
+ },
8518
+ {
8519
+ "epoch": 0.3,
8520
+ "grad_norm": 11.3125,
8521
+ "learning_rate": 1.2108474576271188e-05,
8522
+ "loss": 1.3421,
8523
+ "step": 12140
8524
+ },
8525
+ {
8526
+ "epoch": 0.3,
8527
+ "grad_norm": 17.875,
8528
+ "learning_rate": 1.2101694915254238e-05,
8529
+ "loss": 1.2581,
8530
+ "step": 12150
8531
+ },
8532
+ {
8533
+ "epoch": 0.3,
8534
+ "grad_norm": 32.75,
8535
+ "learning_rate": 1.209491525423729e-05,
8536
+ "loss": 1.4785,
8537
+ "step": 12160
8538
+ },
8539
+ {
8540
+ "epoch": 0.3,
8541
+ "grad_norm": 28.5,
8542
+ "learning_rate": 1.208813559322034e-05,
8543
+ "loss": 1.4118,
8544
+ "step": 12170
8545
+ },
8546
+ {
8547
+ "epoch": 0.3,
8548
+ "grad_norm": 11.875,
8549
+ "learning_rate": 1.2081355932203392e-05,
8550
+ "loss": 1.2336,
8551
+ "step": 12180
8552
+ },
8553
+ {
8554
+ "epoch": 0.3,
8555
+ "grad_norm": 23.75,
8556
+ "learning_rate": 1.2074576271186442e-05,
8557
+ "loss": 1.2431,
8558
+ "step": 12190
8559
+ },
8560
+ {
8561
+ "epoch": 0.3,
8562
+ "grad_norm": 8.4375,
8563
+ "learning_rate": 1.2067796610169493e-05,
8564
+ "loss": 1.3331,
8565
+ "step": 12200
8566
+ },
8567
+ {
8568
+ "epoch": 0.31,
8569
+ "grad_norm": 23.125,
8570
+ "learning_rate": 1.2061016949152544e-05,
8571
+ "loss": 1.5897,
8572
+ "step": 12210
8573
+ },
8574
+ {
8575
+ "epoch": 0.31,
8576
+ "grad_norm": 29.0,
8577
+ "learning_rate": 1.2054237288135596e-05,
8578
+ "loss": 1.4431,
8579
+ "step": 12220
8580
+ },
8581
+ {
8582
+ "epoch": 0.31,
8583
+ "grad_norm": 12.75,
8584
+ "learning_rate": 1.2047457627118646e-05,
8585
+ "loss": 1.207,
8586
+ "step": 12230
8587
+ },
8588
+ {
8589
+ "epoch": 0.31,
8590
+ "grad_norm": 17.75,
8591
+ "learning_rate": 1.2040677966101695e-05,
8592
+ "loss": 1.318,
8593
+ "step": 12240
8594
+ },
8595
+ {
8596
+ "epoch": 0.31,
8597
+ "grad_norm": 14.6875,
8598
+ "learning_rate": 1.2033898305084745e-05,
8599
+ "loss": 1.3912,
8600
+ "step": 12250
8601
+ },
8602
+ {
8603
+ "epoch": 0.31,
8604
+ "grad_norm": 26.0,
8605
+ "learning_rate": 1.2027118644067796e-05,
8606
+ "loss": 1.356,
8607
+ "step": 12260
8608
+ },
8609
+ {
8610
+ "epoch": 0.31,
8611
+ "grad_norm": 28.875,
8612
+ "learning_rate": 1.2020338983050848e-05,
8613
+ "loss": 1.4116,
8614
+ "step": 12270
8615
+ },
8616
+ {
8617
+ "epoch": 0.31,
8618
+ "grad_norm": 8.875,
8619
+ "learning_rate": 1.20135593220339e-05,
8620
+ "loss": 1.3749,
8621
+ "step": 12280
8622
+ },
8623
+ {
8624
+ "epoch": 0.31,
8625
+ "grad_norm": 14.1875,
8626
+ "learning_rate": 1.2006779661016949e-05,
8627
+ "loss": 1.4265,
8628
+ "step": 12290
8629
+ },
8630
+ {
8631
+ "epoch": 0.31,
8632
+ "grad_norm": 9.9375,
8633
+ "learning_rate": 1.2e-05,
8634
+ "loss": 1.4168,
8635
+ "step": 12300
8636
+ },
8637
+ {
8638
+ "epoch": 0.31,
8639
+ "grad_norm": 30.375,
8640
+ "learning_rate": 1.1993220338983052e-05,
8641
+ "loss": 1.3941,
8642
+ "step": 12310
8643
+ },
8644
+ {
8645
+ "epoch": 0.31,
8646
+ "grad_norm": 21.375,
8647
+ "learning_rate": 1.1986440677966103e-05,
8648
+ "loss": 1.3206,
8649
+ "step": 12320
8650
+ },
8651
+ {
8652
+ "epoch": 0.31,
8653
+ "grad_norm": 30.5,
8654
+ "learning_rate": 1.1979661016949153e-05,
8655
+ "loss": 1.4332,
8656
+ "step": 12330
8657
+ },
8658
+ {
8659
+ "epoch": 0.31,
8660
+ "grad_norm": 11.9375,
8661
+ "learning_rate": 1.1972881355932204e-05,
8662
+ "loss": 1.3988,
8663
+ "step": 12340
8664
+ },
8665
+ {
8666
+ "epoch": 0.31,
8667
+ "grad_norm": 19.625,
8668
+ "learning_rate": 1.1966101694915256e-05,
8669
+ "loss": 1.5356,
8670
+ "step": 12350
8671
+ },
8672
+ {
8673
+ "epoch": 0.31,
8674
+ "grad_norm": 24.375,
8675
+ "learning_rate": 1.1959322033898307e-05,
8676
+ "loss": 1.371,
8677
+ "step": 12360
8678
+ },
8679
+ {
8680
+ "epoch": 0.31,
8681
+ "grad_norm": 21.75,
8682
+ "learning_rate": 1.1952542372881357e-05,
8683
+ "loss": 1.6957,
8684
+ "step": 12370
8685
+ },
8686
+ {
8687
+ "epoch": 0.31,
8688
+ "grad_norm": 28.375,
8689
+ "learning_rate": 1.1945762711864408e-05,
8690
+ "loss": 1.4933,
8691
+ "step": 12380
8692
+ },
8693
+ {
8694
+ "epoch": 0.31,
8695
+ "grad_norm": 16.5,
8696
+ "learning_rate": 1.193898305084746e-05,
8697
+ "loss": 1.3463,
8698
+ "step": 12390
8699
+ },
8700
+ {
8701
+ "epoch": 0.31,
8702
+ "grad_norm": 15.4375,
8703
+ "learning_rate": 1.1932203389830511e-05,
8704
+ "loss": 1.431,
8705
+ "step": 12400
8706
+ },
8707
+ {
8708
+ "epoch": 0.31,
8709
+ "grad_norm": 21.875,
8710
+ "learning_rate": 1.192542372881356e-05,
8711
+ "loss": 1.3883,
8712
+ "step": 12410
8713
+ },
8714
+ {
8715
+ "epoch": 0.31,
8716
+ "grad_norm": 16.375,
8717
+ "learning_rate": 1.1918644067796612e-05,
8718
+ "loss": 1.4184,
8719
+ "step": 12420
8720
+ },
8721
+ {
8722
+ "epoch": 0.31,
8723
+ "grad_norm": 18.0,
8724
+ "learning_rate": 1.1911864406779663e-05,
8725
+ "loss": 1.3175,
8726
+ "step": 12430
8727
+ },
8728
+ {
8729
+ "epoch": 0.31,
8730
+ "grad_norm": 17.625,
8731
+ "learning_rate": 1.1905084745762713e-05,
8732
+ "loss": 1.3652,
8733
+ "step": 12440
8734
+ },
8735
+ {
8736
+ "epoch": 0.31,
8737
+ "grad_norm": 19.0,
8738
+ "learning_rate": 1.1898305084745763e-05,
8739
+ "loss": 1.494,
8740
+ "step": 12450
8741
+ },
8742
+ {
8743
+ "epoch": 0.31,
8744
+ "grad_norm": 31.875,
8745
+ "learning_rate": 1.1891525423728814e-05,
8746
+ "loss": 1.4232,
8747
+ "step": 12460
8748
+ },
8749
+ {
8750
+ "epoch": 0.31,
8751
+ "grad_norm": 39.5,
8752
+ "learning_rate": 1.1884745762711864e-05,
8753
+ "loss": 1.3255,
8754
+ "step": 12470
8755
+ },
8756
+ {
8757
+ "epoch": 0.31,
8758
+ "grad_norm": 53.75,
8759
+ "learning_rate": 1.1877966101694915e-05,
8760
+ "loss": 1.3136,
8761
+ "step": 12480
8762
+ },
8763
+ {
8764
+ "epoch": 0.31,
8765
+ "grad_norm": 31.625,
8766
+ "learning_rate": 1.1871186440677967e-05,
8767
+ "loss": 1.3727,
8768
+ "step": 12490
8769
+ },
8770
+ {
8771
+ "epoch": 0.31,
8772
+ "grad_norm": 11.25,
8773
+ "learning_rate": 1.1864406779661018e-05,
8774
+ "loss": 1.1973,
8775
+ "step": 12500
8776
+ },
8777
+ {
8778
+ "epoch": 0.31,
8779
+ "grad_norm": 19.125,
8780
+ "learning_rate": 1.1857627118644068e-05,
8781
+ "loss": 1.5978,
8782
+ "step": 12510
8783
+ },
8784
+ {
8785
+ "epoch": 0.31,
8786
+ "grad_norm": 10.8125,
8787
+ "learning_rate": 1.185084745762712e-05,
8788
+ "loss": 1.3858,
8789
+ "step": 12520
8790
+ },
8791
+ {
8792
+ "epoch": 0.31,
8793
+ "grad_norm": 16.875,
8794
+ "learning_rate": 1.184406779661017e-05,
8795
+ "loss": 1.4182,
8796
+ "step": 12530
8797
+ },
8798
+ {
8799
+ "epoch": 0.31,
8800
+ "grad_norm": 25.0,
8801
+ "learning_rate": 1.183728813559322e-05,
8802
+ "loss": 1.1735,
8803
+ "step": 12540
8804
+ },
8805
+ {
8806
+ "epoch": 0.31,
8807
+ "grad_norm": 30.375,
8808
+ "learning_rate": 1.1830508474576272e-05,
8809
+ "loss": 1.2487,
8810
+ "step": 12550
8811
+ },
8812
+ {
8813
+ "epoch": 0.31,
8814
+ "grad_norm": 40.75,
8815
+ "learning_rate": 1.1823728813559323e-05,
8816
+ "loss": 1.3385,
8817
+ "step": 12560
8818
+ },
8819
+ {
8820
+ "epoch": 0.31,
8821
+ "grad_norm": 11.25,
8822
+ "learning_rate": 1.1816949152542375e-05,
8823
+ "loss": 1.3932,
8824
+ "step": 12570
8825
+ },
8826
+ {
8827
+ "epoch": 0.31,
8828
+ "grad_norm": 12.875,
8829
+ "learning_rate": 1.1810169491525424e-05,
8830
+ "loss": 1.3053,
8831
+ "step": 12580
8832
+ },
8833
+ {
8834
+ "epoch": 0.31,
8835
+ "grad_norm": 16.0,
8836
+ "learning_rate": 1.1803389830508476e-05,
8837
+ "loss": 1.1677,
8838
+ "step": 12590
8839
+ },
8840
+ {
8841
+ "epoch": 0.32,
8842
+ "grad_norm": 33.0,
8843
+ "learning_rate": 1.1796610169491527e-05,
8844
+ "loss": 1.3547,
8845
+ "step": 12600
8846
+ },
8847
+ {
8848
+ "epoch": 0.32,
8849
+ "grad_norm": 13.25,
8850
+ "learning_rate": 1.1789830508474578e-05,
8851
+ "loss": 1.4628,
8852
+ "step": 12610
8853
+ },
8854
+ {
8855
+ "epoch": 0.32,
8856
+ "grad_norm": 36.5,
8857
+ "learning_rate": 1.1783050847457628e-05,
8858
+ "loss": 1.3457,
8859
+ "step": 12620
8860
+ },
8861
+ {
8862
+ "epoch": 0.32,
8863
+ "grad_norm": 77.0,
8864
+ "learning_rate": 1.177627118644068e-05,
8865
+ "loss": 1.4649,
8866
+ "step": 12630
8867
+ },
8868
+ {
8869
+ "epoch": 0.32,
8870
+ "grad_norm": 26.25,
8871
+ "learning_rate": 1.1769491525423731e-05,
8872
+ "loss": 1.6308,
8873
+ "step": 12640
8874
+ },
8875
+ {
8876
+ "epoch": 0.32,
8877
+ "grad_norm": 21.125,
8878
+ "learning_rate": 1.1762711864406782e-05,
8879
+ "loss": 1.315,
8880
+ "step": 12650
8881
+ },
8882
+ {
8883
+ "epoch": 0.32,
8884
+ "grad_norm": 18.5,
8885
+ "learning_rate": 1.175593220338983e-05,
8886
+ "loss": 1.3999,
8887
+ "step": 12660
8888
+ },
8889
+ {
8890
+ "epoch": 0.32,
8891
+ "grad_norm": 12.4375,
8892
+ "learning_rate": 1.1749152542372882e-05,
8893
+ "loss": 1.3569,
8894
+ "step": 12670
8895
+ },
8896
+ {
8897
+ "epoch": 0.32,
8898
+ "grad_norm": 50.0,
8899
+ "learning_rate": 1.1742372881355931e-05,
8900
+ "loss": 1.2881,
8901
+ "step": 12680
8902
+ },
8903
+ {
8904
+ "epoch": 0.32,
8905
+ "grad_norm": 14.3125,
8906
+ "learning_rate": 1.1735593220338983e-05,
8907
+ "loss": 1.5333,
8908
+ "step": 12690
8909
+ },
8910
+ {
8911
+ "epoch": 0.32,
8912
+ "grad_norm": 32.0,
8913
+ "learning_rate": 1.1728813559322034e-05,
8914
+ "loss": 1.4207,
8915
+ "step": 12700
8916
+ },
8917
+ {
8918
+ "epoch": 0.32,
8919
+ "grad_norm": 19.5,
8920
+ "learning_rate": 1.1722033898305086e-05,
8921
+ "loss": 1.3351,
8922
+ "step": 12710
8923
+ },
8924
+ {
8925
+ "epoch": 0.32,
8926
+ "grad_norm": 66.0,
8927
+ "learning_rate": 1.1715254237288135e-05,
8928
+ "loss": 1.3974,
8929
+ "step": 12720
8930
+ },
8931
+ {
8932
+ "epoch": 0.32,
8933
+ "grad_norm": 17.375,
8934
+ "learning_rate": 1.1708474576271187e-05,
8935
+ "loss": 1.3413,
8936
+ "step": 12730
8937
+ },
8938
+ {
8939
+ "epoch": 0.32,
8940
+ "grad_norm": 17.5,
8941
+ "learning_rate": 1.1701694915254238e-05,
8942
+ "loss": 1.4573,
8943
+ "step": 12740
8944
+ },
8945
+ {
8946
+ "epoch": 0.32,
8947
+ "grad_norm": 17.25,
8948
+ "learning_rate": 1.169491525423729e-05,
8949
+ "loss": 1.4507,
8950
+ "step": 12750
8951
+ },
8952
+ {
8953
+ "epoch": 0.32,
8954
+ "grad_norm": 23.5,
8955
+ "learning_rate": 1.168813559322034e-05,
8956
+ "loss": 1.4594,
8957
+ "step": 12760
8958
+ },
8959
+ {
8960
+ "epoch": 0.32,
8961
+ "grad_norm": 27.25,
8962
+ "learning_rate": 1.168135593220339e-05,
8963
+ "loss": 1.3051,
8964
+ "step": 12770
8965
+ },
8966
+ {
8967
+ "epoch": 0.32,
8968
+ "grad_norm": 14.8125,
8969
+ "learning_rate": 1.1674576271186442e-05,
8970
+ "loss": 1.4516,
8971
+ "step": 12780
8972
+ },
8973
+ {
8974
+ "epoch": 0.32,
8975
+ "grad_norm": 12.1875,
8976
+ "learning_rate": 1.1667796610169494e-05,
8977
+ "loss": 1.3318,
8978
+ "step": 12790
8979
+ },
8980
+ {
8981
+ "epoch": 0.32,
8982
+ "grad_norm": 12.0625,
8983
+ "learning_rate": 1.1661016949152543e-05,
8984
+ "loss": 1.5477,
8985
+ "step": 12800
8986
+ },
8987
+ {
8988
+ "epoch": 0.32,
8989
+ "grad_norm": 40.75,
8990
+ "learning_rate": 1.1654237288135595e-05,
8991
+ "loss": 1.2766,
8992
+ "step": 12810
8993
+ },
8994
+ {
8995
+ "epoch": 0.32,
8996
+ "grad_norm": 18.0,
8997
+ "learning_rate": 1.1647457627118646e-05,
8998
+ "loss": 1.2983,
8999
+ "step": 12820
9000
+ },
9001
+ {
9002
+ "epoch": 0.32,
9003
+ "grad_norm": 18.25,
9004
+ "learning_rate": 1.1640677966101697e-05,
9005
+ "loss": 1.2756,
9006
+ "step": 12830
9007
+ },
9008
+ {
9009
+ "epoch": 0.32,
9010
+ "grad_norm": 20.625,
9011
+ "learning_rate": 1.1633898305084747e-05,
9012
+ "loss": 1.4396,
9013
+ "step": 12840
9014
+ },
9015
+ {
9016
+ "epoch": 0.32,
9017
+ "grad_norm": 28.0,
9018
+ "learning_rate": 1.1627118644067799e-05,
9019
+ "loss": 1.3779,
9020
+ "step": 12850
9021
+ },
9022
+ {
9023
+ "epoch": 0.32,
9024
+ "grad_norm": 13.1875,
9025
+ "learning_rate": 1.162033898305085e-05,
9026
+ "loss": 1.4271,
9027
+ "step": 12860
9028
+ },
9029
+ {
9030
+ "epoch": 0.32,
9031
+ "grad_norm": 21.875,
9032
+ "learning_rate": 1.1613559322033898e-05,
9033
+ "loss": 1.2491,
9034
+ "step": 12870
9035
+ },
9036
+ {
9037
+ "epoch": 0.32,
9038
+ "grad_norm": 43.25,
9039
+ "learning_rate": 1.160677966101695e-05,
9040
+ "loss": 1.4746,
9041
+ "step": 12880
9042
+ },
9043
+ {
9044
+ "epoch": 0.32,
9045
+ "grad_norm": 51.75,
9046
+ "learning_rate": 1.16e-05,
9047
+ "loss": 1.4252,
9048
+ "step": 12890
9049
+ },
9050
+ {
9051
+ "epoch": 0.32,
9052
+ "grad_norm": 19.25,
9053
+ "learning_rate": 1.159322033898305e-05,
9054
+ "loss": 1.2963,
9055
+ "step": 12900
9056
+ },
9057
+ {
9058
+ "epoch": 0.32,
9059
+ "grad_norm": 14.4375,
9060
+ "learning_rate": 1.1586440677966102e-05,
9061
+ "loss": 1.3186,
9062
+ "step": 12910
9063
+ },
9064
+ {
9065
+ "epoch": 0.32,
9066
+ "grad_norm": 13.0625,
9067
+ "learning_rate": 1.1579661016949153e-05,
9068
+ "loss": 1.316,
9069
+ "step": 12920
9070
+ },
9071
+ {
9072
+ "epoch": 0.32,
9073
+ "grad_norm": 18.875,
9074
+ "learning_rate": 1.1572881355932205e-05,
9075
+ "loss": 1.3342,
9076
+ "step": 12930
9077
+ },
9078
+ {
9079
+ "epoch": 0.32,
9080
+ "grad_norm": 10.125,
9081
+ "learning_rate": 1.1566101694915254e-05,
9082
+ "loss": 1.1205,
9083
+ "step": 12940
9084
+ },
9085
+ {
9086
+ "epoch": 0.32,
9087
+ "grad_norm": 23.875,
9088
+ "learning_rate": 1.1559322033898306e-05,
9089
+ "loss": 1.4295,
9090
+ "step": 12950
9091
+ },
9092
+ {
9093
+ "epoch": 0.32,
9094
+ "grad_norm": 29.5,
9095
+ "learning_rate": 1.1552542372881357e-05,
9096
+ "loss": 1.3836,
9097
+ "step": 12960
9098
+ },
9099
+ {
9100
+ "epoch": 0.32,
9101
+ "grad_norm": 24.5,
9102
+ "learning_rate": 1.1545762711864409e-05,
9103
+ "loss": 1.2477,
9104
+ "step": 12970
9105
+ },
9106
+ {
9107
+ "epoch": 0.32,
9108
+ "grad_norm": 15.4375,
9109
+ "learning_rate": 1.1538983050847458e-05,
9110
+ "loss": 1.3418,
9111
+ "step": 12980
9112
+ },
9113
+ {
9114
+ "epoch": 0.32,
9115
+ "grad_norm": 30.375,
9116
+ "learning_rate": 1.153220338983051e-05,
9117
+ "loss": 1.3886,
9118
+ "step": 12990
9119
+ },
9120
+ {
9121
+ "epoch": 0.33,
9122
+ "grad_norm": 29.5,
9123
+ "learning_rate": 1.1525423728813561e-05,
9124
+ "loss": 1.4092,
9125
+ "step": 13000
9126
+ },
9127
+ {
9128
+ "epoch": 0.33,
9129
+ "grad_norm": 29.125,
9130
+ "learning_rate": 1.151864406779661e-05,
9131
+ "loss": 1.5852,
9132
+ "step": 13010
9133
+ },
9134
+ {
9135
+ "epoch": 0.33,
9136
+ "grad_norm": 18.875,
9137
+ "learning_rate": 1.1511864406779662e-05,
9138
+ "loss": 1.3838,
9139
+ "step": 13020
9140
+ },
9141
+ {
9142
+ "epoch": 0.33,
9143
+ "grad_norm": 17.0,
9144
+ "learning_rate": 1.1505084745762714e-05,
9145
+ "loss": 1.3919,
9146
+ "step": 13030
9147
+ },
9148
+ {
9149
+ "epoch": 0.33,
9150
+ "grad_norm": 14.125,
9151
+ "learning_rate": 1.1498305084745765e-05,
9152
+ "loss": 1.3076,
9153
+ "step": 13040
9154
+ },
9155
+ {
9156
+ "epoch": 0.33,
9157
+ "grad_norm": 10.0625,
9158
+ "learning_rate": 1.1491525423728815e-05,
9159
+ "loss": 1.339,
9160
+ "step": 13050
9161
+ },
9162
+ {
9163
+ "epoch": 0.33,
9164
+ "grad_norm": 14.25,
9165
+ "learning_rate": 1.1484745762711866e-05,
9166
+ "loss": 1.4315,
9167
+ "step": 13060
9168
+ },
9169
+ {
9170
+ "epoch": 0.33,
9171
+ "grad_norm": 8.6875,
9172
+ "learning_rate": 1.1477966101694916e-05,
9173
+ "loss": 1.3238,
9174
+ "step": 13070
9175
+ },
9176
+ {
9177
+ "epoch": 0.33,
9178
+ "grad_norm": 26.0,
9179
+ "learning_rate": 1.1471186440677965e-05,
9180
+ "loss": 1.3567,
9181
+ "step": 13080
9182
+ },
9183
+ {
9184
+ "epoch": 0.33,
9185
+ "grad_norm": 44.0,
9186
+ "learning_rate": 1.1464406779661017e-05,
9187
+ "loss": 1.2877,
9188
+ "step": 13090
9189
+ },
9190
+ {
9191
+ "epoch": 0.33,
9192
+ "grad_norm": 30.375,
9193
+ "learning_rate": 1.1457627118644068e-05,
9194
+ "loss": 1.3856,
9195
+ "step": 13100
9196
+ },
9197
+ {
9198
+ "epoch": 0.33,
9199
+ "grad_norm": 20.875,
9200
+ "learning_rate": 1.145084745762712e-05,
9201
+ "loss": 1.3871,
9202
+ "step": 13110
9203
+ },
9204
+ {
9205
+ "epoch": 0.33,
9206
+ "grad_norm": 22.625,
9207
+ "learning_rate": 1.144406779661017e-05,
9208
+ "loss": 1.3937,
9209
+ "step": 13120
9210
+ },
9211
+ {
9212
+ "epoch": 0.33,
9213
+ "grad_norm": 19.625,
9214
+ "learning_rate": 1.143728813559322e-05,
9215
+ "loss": 1.4812,
9216
+ "step": 13130
9217
+ },
9218
+ {
9219
+ "epoch": 0.33,
9220
+ "grad_norm": 68.5,
9221
+ "learning_rate": 1.1430508474576272e-05,
9222
+ "loss": 1.6372,
9223
+ "step": 13140
9224
+ },
9225
+ {
9226
+ "epoch": 0.33,
9227
+ "grad_norm": 29.75,
9228
+ "learning_rate": 1.1423728813559322e-05,
9229
+ "loss": 1.3724,
9230
+ "step": 13150
9231
+ },
9232
+ {
9233
+ "epoch": 0.33,
9234
+ "grad_norm": 14.25,
9235
+ "learning_rate": 1.1416949152542373e-05,
9236
+ "loss": 1.2524,
9237
+ "step": 13160
9238
+ },
9239
+ {
9240
+ "epoch": 0.33,
9241
+ "grad_norm": 12.375,
9242
+ "learning_rate": 1.1410169491525425e-05,
9243
+ "loss": 1.5147,
9244
+ "step": 13170
9245
+ },
9246
+ {
9247
+ "epoch": 0.33,
9248
+ "grad_norm": 59.75,
9249
+ "learning_rate": 1.1403389830508476e-05,
9250
+ "loss": 1.4632,
9251
+ "step": 13180
9252
+ },
9253
+ {
9254
+ "epoch": 0.33,
9255
+ "grad_norm": 22.125,
9256
+ "learning_rate": 1.1396610169491526e-05,
9257
+ "loss": 1.3722,
9258
+ "step": 13190
9259
+ },
9260
+ {
9261
+ "epoch": 0.33,
9262
+ "grad_norm": 20.5,
9263
+ "learning_rate": 1.1389830508474577e-05,
9264
+ "loss": 1.349,
9265
+ "step": 13200
9266
+ },
9267
+ {
9268
+ "epoch": 0.33,
9269
+ "grad_norm": 17.125,
9270
+ "learning_rate": 1.1383050847457629e-05,
9271
+ "loss": 1.3975,
9272
+ "step": 13210
9273
+ },
9274
+ {
9275
+ "epoch": 0.33,
9276
+ "grad_norm": 32.5,
9277
+ "learning_rate": 1.137627118644068e-05,
9278
+ "loss": 1.4174,
9279
+ "step": 13220
9280
+ },
9281
+ {
9282
+ "epoch": 0.33,
9283
+ "grad_norm": 11.3125,
9284
+ "learning_rate": 1.136949152542373e-05,
9285
+ "loss": 1.2684,
9286
+ "step": 13230
9287
+ },
9288
+ {
9289
+ "epoch": 0.33,
9290
+ "grad_norm": 22.375,
9291
+ "learning_rate": 1.1362711864406781e-05,
9292
+ "loss": 1.3821,
9293
+ "step": 13240
9294
+ },
9295
+ {
9296
+ "epoch": 0.33,
9297
+ "grad_norm": 15.6875,
9298
+ "learning_rate": 1.1355932203389833e-05,
9299
+ "loss": 1.3647,
9300
+ "step": 13250
9301
+ },
9302
+ {
9303
+ "epoch": 0.33,
9304
+ "grad_norm": 9.1875,
9305
+ "learning_rate": 1.1349152542372884e-05,
9306
+ "loss": 1.3544,
9307
+ "step": 13260
9308
+ },
9309
+ {
9310
+ "epoch": 0.33,
9311
+ "grad_norm": 27.375,
9312
+ "learning_rate": 1.1342372881355934e-05,
9313
+ "loss": 1.2971,
9314
+ "step": 13270
9315
+ },
9316
+ {
9317
+ "epoch": 0.33,
9318
+ "grad_norm": 38.75,
9319
+ "learning_rate": 1.1335593220338983e-05,
9320
+ "loss": 1.5632,
9321
+ "step": 13280
9322
+ },
9323
+ {
9324
+ "epoch": 0.33,
9325
+ "grad_norm": 36.0,
9326
+ "learning_rate": 1.1328813559322033e-05,
9327
+ "loss": 1.5077,
9328
+ "step": 13290
9329
+ },
9330
+ {
9331
+ "epoch": 0.33,
9332
+ "grad_norm": 31.0,
9333
+ "learning_rate": 1.1322033898305084e-05,
9334
+ "loss": 1.1637,
9335
+ "step": 13300
9336
+ },
9337
+ {
9338
+ "epoch": 0.33,
9339
+ "grad_norm": 29.25,
9340
+ "learning_rate": 1.1315254237288136e-05,
9341
+ "loss": 1.4482,
9342
+ "step": 13310
9343
+ },
9344
+ {
9345
+ "epoch": 0.33,
9346
+ "grad_norm": 14.0,
9347
+ "learning_rate": 1.1308474576271187e-05,
9348
+ "loss": 1.3316,
9349
+ "step": 13320
9350
+ },
9351
+ {
9352
+ "epoch": 0.33,
9353
+ "grad_norm": 26.375,
9354
+ "learning_rate": 1.1301694915254237e-05,
9355
+ "loss": 1.3351,
9356
+ "step": 13330
9357
+ },
9358
+ {
9359
+ "epoch": 0.33,
9360
+ "grad_norm": 27.0,
9361
+ "learning_rate": 1.1294915254237288e-05,
9362
+ "loss": 1.4783,
9363
+ "step": 13340
9364
+ },
9365
+ {
9366
+ "epoch": 0.33,
9367
+ "grad_norm": 28.5,
9368
+ "learning_rate": 1.128813559322034e-05,
9369
+ "loss": 1.4635,
9370
+ "step": 13350
9371
+ },
9372
+ {
9373
+ "epoch": 0.33,
9374
+ "grad_norm": 28.625,
9375
+ "learning_rate": 1.1281355932203391e-05,
9376
+ "loss": 1.2941,
9377
+ "step": 13360
9378
+ },
9379
+ {
9380
+ "epoch": 0.33,
9381
+ "grad_norm": 21.25,
9382
+ "learning_rate": 1.1274576271186441e-05,
9383
+ "loss": 1.4337,
9384
+ "step": 13370
9385
+ },
9386
+ {
9387
+ "epoch": 0.33,
9388
+ "grad_norm": 9.25,
9389
+ "learning_rate": 1.1267796610169492e-05,
9390
+ "loss": 1.3615,
9391
+ "step": 13380
9392
+ },
9393
+ {
9394
+ "epoch": 0.33,
9395
+ "grad_norm": 30.5,
9396
+ "learning_rate": 1.1261016949152544e-05,
9397
+ "loss": 1.4474,
9398
+ "step": 13390
9399
+ },
9400
+ {
9401
+ "epoch": 0.34,
9402
+ "grad_norm": 29.5,
9403
+ "learning_rate": 1.1254237288135595e-05,
9404
+ "loss": 1.512,
9405
+ "step": 13400
9406
+ },
9407
+ {
9408
+ "epoch": 0.34,
9409
+ "grad_norm": 16.375,
9410
+ "learning_rate": 1.1247457627118645e-05,
9411
+ "loss": 1.3367,
9412
+ "step": 13410
9413
+ },
9414
+ {
9415
+ "epoch": 0.34,
9416
+ "grad_norm": 17.25,
9417
+ "learning_rate": 1.1240677966101696e-05,
9418
+ "loss": 1.3797,
9419
+ "step": 13420
9420
+ },
9421
+ {
9422
+ "epoch": 0.34,
9423
+ "grad_norm": 21.75,
9424
+ "learning_rate": 1.1233898305084748e-05,
9425
+ "loss": 1.3935,
9426
+ "step": 13430
9427
+ },
9428
+ {
9429
+ "epoch": 0.34,
9430
+ "grad_norm": 48.0,
9431
+ "learning_rate": 1.1227118644067799e-05,
9432
+ "loss": 1.2878,
9433
+ "step": 13440
9434
+ },
9435
+ {
9436
+ "epoch": 0.34,
9437
+ "grad_norm": 12.3125,
9438
+ "learning_rate": 1.1220338983050849e-05,
9439
+ "loss": 1.1581,
9440
+ "step": 13450
9441
+ },
9442
+ {
9443
+ "epoch": 0.34,
9444
+ "grad_norm": 17.75,
9445
+ "learning_rate": 1.12135593220339e-05,
9446
+ "loss": 1.4249,
9447
+ "step": 13460
9448
+ },
9449
+ {
9450
+ "epoch": 0.34,
9451
+ "grad_norm": 10.0,
9452
+ "learning_rate": 1.1206779661016951e-05,
9453
+ "loss": 1.3671,
9454
+ "step": 13470
9455
+ },
9456
+ {
9457
+ "epoch": 0.34,
9458
+ "grad_norm": 9.6875,
9459
+ "learning_rate": 1.1200000000000001e-05,
9460
+ "loss": 1.4198,
9461
+ "step": 13480
9462
+ },
9463
+ {
9464
+ "epoch": 0.34,
9465
+ "grad_norm": 25.5,
9466
+ "learning_rate": 1.1193220338983051e-05,
9467
+ "loss": 1.3878,
9468
+ "step": 13490
9469
+ },
9470
+ {
9471
+ "epoch": 0.34,
9472
+ "grad_norm": 26.75,
9473
+ "learning_rate": 1.1186440677966102e-05,
9474
+ "loss": 1.158,
9475
+ "step": 13500
9476
+ },
9477
+ {
9478
+ "epoch": 0.34,
9479
+ "grad_norm": 31.375,
9480
+ "learning_rate": 1.1179661016949152e-05,
9481
+ "loss": 1.3105,
9482
+ "step": 13510
9483
+ },
9484
+ {
9485
+ "epoch": 0.34,
9486
+ "grad_norm": 15.75,
9487
+ "learning_rate": 1.1172881355932203e-05,
9488
+ "loss": 1.5006,
9489
+ "step": 13520
9490
+ },
9491
+ {
9492
+ "epoch": 0.34,
9493
+ "grad_norm": 72.0,
9494
+ "learning_rate": 1.1166101694915255e-05,
9495
+ "loss": 1.4191,
9496
+ "step": 13530
9497
+ },
9498
+ {
9499
+ "epoch": 0.34,
9500
+ "grad_norm": 20.875,
9501
+ "learning_rate": 1.1159322033898306e-05,
9502
+ "loss": 1.3523,
9503
+ "step": 13540
9504
+ },
9505
+ {
9506
+ "epoch": 0.34,
9507
+ "grad_norm": 22.25,
9508
+ "learning_rate": 1.1152542372881356e-05,
9509
+ "loss": 1.3726,
9510
+ "step": 13550
9511
+ },
9512
+ {
9513
+ "epoch": 0.34,
9514
+ "grad_norm": 48.25,
9515
+ "learning_rate": 1.1145762711864407e-05,
9516
+ "loss": 1.3706,
9517
+ "step": 13560
9518
+ },
9519
+ {
9520
+ "epoch": 0.34,
9521
+ "grad_norm": 39.25,
9522
+ "learning_rate": 1.1138983050847459e-05,
9523
+ "loss": 1.3988,
9524
+ "step": 13570
9525
+ },
9526
+ {
9527
+ "epoch": 0.34,
9528
+ "grad_norm": 34.0,
9529
+ "learning_rate": 1.113220338983051e-05,
9530
+ "loss": 1.36,
9531
+ "step": 13580
9532
+ },
9533
+ {
9534
+ "epoch": 0.34,
9535
+ "grad_norm": 18.25,
9536
+ "learning_rate": 1.112542372881356e-05,
9537
+ "loss": 1.5547,
9538
+ "step": 13590
9539
+ },
9540
+ {
9541
+ "epoch": 0.34,
9542
+ "grad_norm": 27.875,
9543
+ "learning_rate": 1.1118644067796611e-05,
9544
+ "loss": 1.2315,
9545
+ "step": 13600
9546
+ },
9547
+ {
9548
+ "epoch": 0.34,
9549
+ "grad_norm": 26.375,
9550
+ "learning_rate": 1.1111864406779663e-05,
9551
+ "loss": 1.1956,
9552
+ "step": 13610
9553
+ },
9554
+ {
9555
+ "epoch": 0.34,
9556
+ "grad_norm": 13.375,
9557
+ "learning_rate": 1.1105084745762712e-05,
9558
+ "loss": 1.2934,
9559
+ "step": 13620
9560
+ },
9561
+ {
9562
+ "epoch": 0.34,
9563
+ "grad_norm": 26.625,
9564
+ "learning_rate": 1.1098305084745764e-05,
9565
+ "loss": 1.416,
9566
+ "step": 13630
9567
+ },
9568
+ {
9569
+ "epoch": 0.34,
9570
+ "grad_norm": 17.875,
9571
+ "learning_rate": 1.1091525423728815e-05,
9572
+ "loss": 1.2588,
9573
+ "step": 13640
9574
+ },
9575
+ {
9576
+ "epoch": 0.34,
9577
+ "grad_norm": 10.5625,
9578
+ "learning_rate": 1.1084745762711867e-05,
9579
+ "loss": 1.3671,
9580
+ "step": 13650
9581
+ },
9582
+ {
9583
+ "epoch": 0.34,
9584
+ "grad_norm": 14.3125,
9585
+ "learning_rate": 1.1077966101694916e-05,
9586
+ "loss": 1.4697,
9587
+ "step": 13660
9588
+ },
9589
+ {
9590
+ "epoch": 0.34,
9591
+ "grad_norm": 18.625,
9592
+ "learning_rate": 1.1071186440677968e-05,
9593
+ "loss": 1.2121,
9594
+ "step": 13670
9595
+ },
9596
+ {
9597
+ "epoch": 0.34,
9598
+ "grad_norm": 21.25,
9599
+ "learning_rate": 1.1064406779661019e-05,
9600
+ "loss": 1.2724,
9601
+ "step": 13680
9602
+ },
9603
+ {
9604
+ "epoch": 0.34,
9605
+ "grad_norm": 21.75,
9606
+ "learning_rate": 1.105762711864407e-05,
9607
+ "loss": 1.3785,
9608
+ "step": 13690
9609
+ },
9610
+ {
9611
+ "epoch": 0.34,
9612
+ "grad_norm": 14.75,
9613
+ "learning_rate": 1.1050847457627118e-05,
9614
+ "loss": 1.5127,
9615
+ "step": 13700
9616
+ },
9617
+ {
9618
+ "epoch": 0.34,
9619
+ "grad_norm": 26.375,
9620
+ "learning_rate": 1.104406779661017e-05,
9621
+ "loss": 1.5337,
9622
+ "step": 13710
9623
+ },
9624
+ {
9625
+ "epoch": 0.34,
9626
+ "grad_norm": 18.125,
9627
+ "learning_rate": 1.1037288135593221e-05,
9628
+ "loss": 1.4343,
9629
+ "step": 13720
9630
+ },
9631
+ {
9632
+ "epoch": 0.34,
9633
+ "grad_norm": 31.0,
9634
+ "learning_rate": 1.1030508474576271e-05,
9635
+ "loss": 1.4872,
9636
+ "step": 13730
9637
+ },
9638
+ {
9639
+ "epoch": 0.34,
9640
+ "grad_norm": 24.125,
9641
+ "learning_rate": 1.1023728813559322e-05,
9642
+ "loss": 1.3495,
9643
+ "step": 13740
9644
+ },
9645
+ {
9646
+ "epoch": 0.34,
9647
+ "grad_norm": 12.125,
9648
+ "learning_rate": 1.1016949152542374e-05,
9649
+ "loss": 1.2934,
9650
+ "step": 13750
9651
+ },
9652
+ {
9653
+ "epoch": 0.34,
9654
+ "grad_norm": 34.25,
9655
+ "learning_rate": 1.1010169491525423e-05,
9656
+ "loss": 1.4117,
9657
+ "step": 13760
9658
+ },
9659
+ {
9660
+ "epoch": 0.34,
9661
+ "grad_norm": 19.625,
9662
+ "learning_rate": 1.1003389830508475e-05,
9663
+ "loss": 1.5345,
9664
+ "step": 13770
9665
+ },
9666
+ {
9667
+ "epoch": 0.34,
9668
+ "grad_norm": 21.125,
9669
+ "learning_rate": 1.0996610169491526e-05,
9670
+ "loss": 1.4747,
9671
+ "step": 13780
9672
+ },
9673
+ {
9674
+ "epoch": 0.34,
9675
+ "grad_norm": 28.5,
9676
+ "learning_rate": 1.0989830508474578e-05,
9677
+ "loss": 1.3429,
9678
+ "step": 13790
9679
+ },
9680
+ {
9681
+ "epoch": 0.34,
9682
+ "grad_norm": 15.4375,
9683
+ "learning_rate": 1.0983050847457627e-05,
9684
+ "loss": 1.0869,
9685
+ "step": 13800
9686
+ },
9687
+ {
9688
+ "epoch": 0.35,
9689
+ "grad_norm": 35.25,
9690
+ "learning_rate": 1.0976271186440679e-05,
9691
+ "loss": 1.5203,
9692
+ "step": 13810
9693
+ },
9694
+ {
9695
+ "epoch": 0.35,
9696
+ "grad_norm": 55.25,
9697
+ "learning_rate": 1.096949152542373e-05,
9698
+ "loss": 1.3117,
9699
+ "step": 13820
9700
+ },
9701
+ {
9702
+ "epoch": 0.35,
9703
+ "grad_norm": 14.3125,
9704
+ "learning_rate": 1.0962711864406782e-05,
9705
+ "loss": 1.4044,
9706
+ "step": 13830
9707
+ },
9708
+ {
9709
+ "epoch": 0.35,
9710
+ "grad_norm": 15.125,
9711
+ "learning_rate": 1.0955932203389831e-05,
9712
+ "loss": 1.5186,
9713
+ "step": 13840
9714
+ },
9715
+ {
9716
+ "epoch": 0.35,
9717
+ "grad_norm": 20.5,
9718
+ "learning_rate": 1.0949152542372883e-05,
9719
+ "loss": 1.251,
9720
+ "step": 13850
9721
+ },
9722
+ {
9723
+ "epoch": 0.35,
9724
+ "grad_norm": 42.75,
9725
+ "learning_rate": 1.0942372881355934e-05,
9726
+ "loss": 1.3674,
9727
+ "step": 13860
9728
+ },
9729
+ {
9730
+ "epoch": 0.35,
9731
+ "grad_norm": 12.0,
9732
+ "learning_rate": 1.0935593220338985e-05,
9733
+ "loss": 1.4548,
9734
+ "step": 13870
9735
+ },
9736
+ {
9737
+ "epoch": 0.35,
9738
+ "grad_norm": 18.5,
9739
+ "learning_rate": 1.0928813559322035e-05,
9740
+ "loss": 1.2881,
9741
+ "step": 13880
9742
+ },
9743
+ {
9744
+ "epoch": 0.35,
9745
+ "grad_norm": 47.0,
9746
+ "learning_rate": 1.0922033898305087e-05,
9747
+ "loss": 1.4235,
9748
+ "step": 13890
9749
+ },
9750
+ {
9751
+ "epoch": 0.35,
9752
+ "grad_norm": 15.9375,
9753
+ "learning_rate": 1.0915254237288135e-05,
9754
+ "loss": 1.267,
9755
+ "step": 13900
9756
+ },
9757
+ {
9758
+ "epoch": 0.35,
9759
+ "grad_norm": 30.75,
9760
+ "learning_rate": 1.0908474576271186e-05,
9761
+ "loss": 1.2051,
9762
+ "step": 13910
9763
+ },
9764
+ {
9765
+ "epoch": 0.35,
9766
+ "grad_norm": 21.625,
9767
+ "learning_rate": 1.0901694915254237e-05,
9768
+ "loss": 1.4902,
9769
+ "step": 13920
9770
+ },
9771
+ {
9772
+ "epoch": 0.35,
9773
+ "grad_norm": 19.0,
9774
+ "learning_rate": 1.0894915254237289e-05,
9775
+ "loss": 1.0872,
9776
+ "step": 13930
9777
+ },
9778
+ {
9779
+ "epoch": 0.35,
9780
+ "grad_norm": 48.25,
9781
+ "learning_rate": 1.0888135593220339e-05,
9782
+ "loss": 1.3268,
9783
+ "step": 13940
9784
+ },
9785
+ {
9786
+ "epoch": 0.35,
9787
+ "grad_norm": 20.0,
9788
+ "learning_rate": 1.088135593220339e-05,
9789
+ "loss": 1.3066,
9790
+ "step": 13950
9791
+ },
9792
+ {
9793
+ "epoch": 0.35,
9794
+ "grad_norm": 12.1875,
9795
+ "learning_rate": 1.0874576271186441e-05,
9796
+ "loss": 1.3713,
9797
+ "step": 13960
9798
+ },
9799
+ {
9800
+ "epoch": 0.35,
9801
+ "grad_norm": 21.375,
9802
+ "learning_rate": 1.0867796610169493e-05,
9803
+ "loss": 1.4468,
9804
+ "step": 13970
9805
+ },
9806
+ {
9807
+ "epoch": 0.35,
9808
+ "grad_norm": 17.0,
9809
+ "learning_rate": 1.0861016949152542e-05,
9810
+ "loss": 1.4413,
9811
+ "step": 13980
9812
+ },
9813
+ {
9814
+ "epoch": 0.35,
9815
+ "grad_norm": 21.0,
9816
+ "learning_rate": 1.0854237288135594e-05,
9817
+ "loss": 1.3061,
9818
+ "step": 13990
9819
+ },
9820
+ {
9821
+ "epoch": 0.35,
9822
+ "grad_norm": 20.875,
9823
+ "learning_rate": 1.0847457627118645e-05,
9824
+ "loss": 1.4503,
9825
+ "step": 14000
9826
+ },
9827
+ {
9828
+ "epoch": 0.35,
9829
+ "grad_norm": 15.125,
9830
+ "learning_rate": 1.0840677966101697e-05,
9831
+ "loss": 1.2215,
9832
+ "step": 14010
9833
+ },
9834
+ {
9835
+ "epoch": 0.35,
9836
+ "grad_norm": 24.625,
9837
+ "learning_rate": 1.0833898305084746e-05,
9838
+ "loss": 1.4355,
9839
+ "step": 14020
9840
+ },
9841
+ {
9842
+ "epoch": 0.35,
9843
+ "grad_norm": 13.8125,
9844
+ "learning_rate": 1.0827118644067798e-05,
9845
+ "loss": 1.4657,
9846
+ "step": 14030
9847
+ },
9848
+ {
9849
+ "epoch": 0.35,
9850
+ "grad_norm": 26.25,
9851
+ "learning_rate": 1.0820338983050849e-05,
9852
+ "loss": 1.268,
9853
+ "step": 14040
9854
+ },
9855
+ {
9856
+ "epoch": 0.35,
9857
+ "grad_norm": 20.5,
9858
+ "learning_rate": 1.08135593220339e-05,
9859
+ "loss": 1.306,
9860
+ "step": 14050
9861
+ },
9862
+ {
9863
+ "epoch": 0.35,
9864
+ "grad_norm": 12.8125,
9865
+ "learning_rate": 1.080677966101695e-05,
9866
+ "loss": 1.4965,
9867
+ "step": 14060
9868
+ },
9869
+ {
9870
+ "epoch": 0.35,
9871
+ "grad_norm": 18.5,
9872
+ "learning_rate": 1.0800000000000002e-05,
9873
+ "loss": 1.3629,
9874
+ "step": 14070
9875
+ },
9876
+ {
9877
+ "epoch": 0.35,
9878
+ "grad_norm": 32.0,
9879
+ "learning_rate": 1.0793220338983053e-05,
9880
+ "loss": 1.2364,
9881
+ "step": 14080
9882
+ },
9883
+ {
9884
+ "epoch": 0.35,
9885
+ "grad_norm": 14.4375,
9886
+ "learning_rate": 1.0786440677966103e-05,
9887
+ "loss": 1.334,
9888
+ "step": 14090
9889
+ },
9890
+ {
9891
+ "epoch": 0.35,
9892
+ "grad_norm": 11.875,
9893
+ "learning_rate": 1.0779661016949154e-05,
9894
+ "loss": 1.4108,
9895
+ "step": 14100
9896
+ },
9897
+ {
9898
+ "epoch": 0.35,
9899
+ "grad_norm": 35.5,
9900
+ "learning_rate": 1.0772881355932204e-05,
9901
+ "loss": 1.4491,
9902
+ "step": 14110
9903
+ },
9904
+ {
9905
+ "epoch": 0.35,
9906
+ "grad_norm": 16.25,
9907
+ "learning_rate": 1.0766101694915254e-05,
9908
+ "loss": 1.4305,
9909
+ "step": 14120
9910
+ },
9911
+ {
9912
+ "epoch": 0.35,
9913
+ "grad_norm": 22.75,
9914
+ "learning_rate": 1.0759322033898305e-05,
9915
+ "loss": 1.324,
9916
+ "step": 14130
9917
+ },
9918
+ {
9919
+ "epoch": 0.35,
9920
+ "grad_norm": 16.625,
9921
+ "learning_rate": 1.0752542372881356e-05,
9922
+ "loss": 1.1912,
9923
+ "step": 14140
9924
+ },
9925
+ {
9926
+ "epoch": 0.35,
9927
+ "grad_norm": 24.875,
9928
+ "learning_rate": 1.0745762711864408e-05,
9929
+ "loss": 1.3567,
9930
+ "step": 14150
9931
+ },
9932
+ {
9933
+ "epoch": 0.35,
9934
+ "grad_norm": 15.25,
9935
+ "learning_rate": 1.0738983050847457e-05,
9936
+ "loss": 1.3639,
9937
+ "step": 14160
9938
+ },
9939
+ {
9940
+ "epoch": 0.35,
9941
+ "grad_norm": 11.5625,
9942
+ "learning_rate": 1.0732203389830509e-05,
9943
+ "loss": 1.4455,
9944
+ "step": 14170
9945
+ },
9946
+ {
9947
+ "epoch": 0.35,
9948
+ "grad_norm": 13.4375,
9949
+ "learning_rate": 1.072542372881356e-05,
9950
+ "loss": 1.4245,
9951
+ "step": 14180
9952
+ },
9953
+ {
9954
+ "epoch": 0.35,
9955
+ "grad_norm": 12.9375,
9956
+ "learning_rate": 1.0718644067796612e-05,
9957
+ "loss": 1.303,
9958
+ "step": 14190
9959
+ },
9960
+ {
9961
+ "epoch": 0.35,
9962
+ "grad_norm": 22.125,
9963
+ "learning_rate": 1.0711864406779661e-05,
9964
+ "loss": 1.3533,
9965
+ "step": 14200
9966
+ },
9967
+ {
9968
+ "epoch": 0.36,
9969
+ "grad_norm": 13.9375,
9970
+ "learning_rate": 1.0705084745762713e-05,
9971
+ "loss": 1.3825,
9972
+ "step": 14210
9973
+ },
9974
+ {
9975
+ "epoch": 0.36,
9976
+ "grad_norm": 30.875,
9977
+ "learning_rate": 1.0698305084745764e-05,
9978
+ "loss": 1.3004,
9979
+ "step": 14220
9980
+ },
9981
+ {
9982
+ "epoch": 0.36,
9983
+ "grad_norm": 22.625,
9984
+ "learning_rate": 1.0691525423728814e-05,
9985
+ "loss": 1.3967,
9986
+ "step": 14230
9987
+ },
9988
+ {
9989
+ "epoch": 0.36,
9990
+ "grad_norm": 13.125,
9991
+ "learning_rate": 1.0684745762711865e-05,
9992
+ "loss": 1.2226,
9993
+ "step": 14240
9994
+ },
9995
+ {
9996
+ "epoch": 0.36,
9997
+ "grad_norm": 33.25,
9998
+ "learning_rate": 1.0677966101694917e-05,
9999
+ "loss": 1.3159,
10000
+ "step": 14250
10001
+ },
10002
+ {
10003
+ "epoch": 0.36,
10004
+ "grad_norm": 20.0,
10005
+ "learning_rate": 1.0671186440677968e-05,
10006
+ "loss": 1.239,
10007
+ "step": 14260
10008
+ },
10009
+ {
10010
+ "epoch": 0.36,
10011
+ "grad_norm": 16.25,
10012
+ "learning_rate": 1.0664406779661018e-05,
10013
+ "loss": 1.3726,
10014
+ "step": 14270
10015
+ },
10016
+ {
10017
+ "epoch": 0.36,
10018
+ "grad_norm": 16.625,
10019
+ "learning_rate": 1.065762711864407e-05,
10020
+ "loss": 1.3982,
10021
+ "step": 14280
10022
+ },
10023
+ {
10024
+ "epoch": 0.36,
10025
+ "grad_norm": 23.875,
10026
+ "learning_rate": 1.065084745762712e-05,
10027
+ "loss": 1.4308,
10028
+ "step": 14290
10029
+ },
10030
+ {
10031
+ "epoch": 0.36,
10032
+ "grad_norm": 15.0625,
10033
+ "learning_rate": 1.0644067796610172e-05,
10034
+ "loss": 1.299,
10035
+ "step": 14300
10036
+ },
10037
+ {
10038
+ "epoch": 0.36,
10039
+ "grad_norm": 40.5,
10040
+ "learning_rate": 1.0637288135593222e-05,
10041
+ "loss": 1.3387,
10042
+ "step": 14310
10043
+ },
10044
+ {
10045
+ "epoch": 0.36,
10046
+ "grad_norm": 29.375,
10047
+ "learning_rate": 1.0630508474576271e-05,
10048
+ "loss": 1.5603,
10049
+ "step": 14320
10050
+ },
10051
+ {
10052
+ "epoch": 0.36,
10053
+ "grad_norm": 28.0,
10054
+ "learning_rate": 1.0623728813559323e-05,
10055
+ "loss": 1.4149,
10056
+ "step": 14330
10057
+ },
10058
+ {
10059
+ "epoch": 0.36,
10060
+ "grad_norm": 51.5,
10061
+ "learning_rate": 1.0616949152542373e-05,
10062
+ "loss": 1.4769,
10063
+ "step": 14340
10064
+ },
10065
+ {
10066
+ "epoch": 0.36,
10067
+ "grad_norm": 18.125,
10068
+ "learning_rate": 1.0610169491525424e-05,
10069
+ "loss": 1.5026,
10070
+ "step": 14350
10071
+ },
10072
+ {
10073
+ "epoch": 0.36,
10074
+ "grad_norm": 30.5,
10075
+ "learning_rate": 1.0603389830508475e-05,
10076
+ "loss": 1.2405,
10077
+ "step": 14360
10078
+ },
10079
+ {
10080
+ "epoch": 0.36,
10081
+ "grad_norm": 19.625,
10082
+ "learning_rate": 1.0596610169491525e-05,
10083
+ "loss": 1.3363,
10084
+ "step": 14370
10085
+ },
10086
+ {
10087
+ "epoch": 0.36,
10088
+ "grad_norm": 20.75,
10089
+ "learning_rate": 1.0589830508474576e-05,
10090
+ "loss": 1.4854,
10091
+ "step": 14380
10092
+ },
10093
+ {
10094
+ "epoch": 0.36,
10095
+ "grad_norm": 17.875,
10096
+ "learning_rate": 1.0583050847457628e-05,
10097
+ "loss": 1.5275,
10098
+ "step": 14390
10099
+ },
10100
+ {
10101
+ "epoch": 0.36,
10102
+ "grad_norm": 20.625,
10103
+ "learning_rate": 1.057627118644068e-05,
10104
+ "loss": 1.3683,
10105
+ "step": 14400
10106
+ },
10107
+ {
10108
+ "epoch": 0.36,
10109
+ "grad_norm": 27.0,
10110
+ "learning_rate": 1.0569491525423729e-05,
10111
+ "loss": 1.3453,
10112
+ "step": 14410
10113
+ },
10114
+ {
10115
+ "epoch": 0.36,
10116
+ "grad_norm": 15.9375,
10117
+ "learning_rate": 1.056271186440678e-05,
10118
+ "loss": 1.4344,
10119
+ "step": 14420
10120
+ },
10121
+ {
10122
+ "epoch": 0.36,
10123
+ "grad_norm": 20.5,
10124
+ "learning_rate": 1.0555932203389832e-05,
10125
+ "loss": 1.4687,
10126
+ "step": 14430
10127
+ },
10128
+ {
10129
+ "epoch": 0.36,
10130
+ "grad_norm": 43.5,
10131
+ "learning_rate": 1.0549152542372883e-05,
10132
+ "loss": 1.6406,
10133
+ "step": 14440
10134
+ },
10135
+ {
10136
+ "epoch": 0.36,
10137
+ "grad_norm": 21.25,
10138
+ "learning_rate": 1.0542372881355933e-05,
10139
+ "loss": 1.4751,
10140
+ "step": 14450
10141
+ },
10142
+ {
10143
+ "epoch": 0.36,
10144
+ "grad_norm": 11.5625,
10145
+ "learning_rate": 1.0535593220338984e-05,
10146
+ "loss": 1.3908,
10147
+ "step": 14460
10148
+ },
10149
+ {
10150
+ "epoch": 0.36,
10151
+ "grad_norm": 33.25,
10152
+ "learning_rate": 1.0528813559322036e-05,
10153
+ "loss": 1.3654,
10154
+ "step": 14470
10155
+ },
10156
+ {
10157
+ "epoch": 0.36,
10158
+ "grad_norm": 12.5625,
10159
+ "learning_rate": 1.0522033898305087e-05,
10160
+ "loss": 1.3358,
10161
+ "step": 14480
10162
+ },
10163
+ {
10164
+ "epoch": 0.36,
10165
+ "grad_norm": 12.5,
10166
+ "learning_rate": 1.0515254237288137e-05,
10167
+ "loss": 1.2108,
10168
+ "step": 14490
10169
+ },
10170
+ {
10171
+ "epoch": 0.36,
10172
+ "grad_norm": 8.625,
10173
+ "learning_rate": 1.0508474576271188e-05,
10174
+ "loss": 1.483,
10175
+ "step": 14500
10176
+ },
10177
+ {
10178
+ "epoch": 0.36,
10179
+ "grad_norm": 25.75,
10180
+ "learning_rate": 1.050169491525424e-05,
10181
+ "loss": 1.4243,
10182
+ "step": 14510
10183
+ },
10184
+ {
10185
+ "epoch": 0.36,
10186
+ "grad_norm": 12.75,
10187
+ "learning_rate": 1.049491525423729e-05,
10188
+ "loss": 1.4495,
10189
+ "step": 14520
10190
+ },
10191
+ {
10192
+ "epoch": 0.36,
10193
+ "grad_norm": 26.375,
10194
+ "learning_rate": 1.0488135593220339e-05,
10195
+ "loss": 1.3261,
10196
+ "step": 14530
10197
+ },
10198
+ {
10199
+ "epoch": 0.36,
10200
+ "grad_norm": 17.0,
10201
+ "learning_rate": 1.048135593220339e-05,
10202
+ "loss": 1.3303,
10203
+ "step": 14540
10204
+ },
10205
+ {
10206
+ "epoch": 0.36,
10207
+ "grad_norm": 16.625,
10208
+ "learning_rate": 1.047457627118644e-05,
10209
+ "loss": 1.2757,
10210
+ "step": 14550
10211
+ },
10212
+ {
10213
+ "epoch": 0.36,
10214
+ "grad_norm": 41.0,
10215
+ "learning_rate": 1.0467796610169491e-05,
10216
+ "loss": 1.2599,
10217
+ "step": 14560
10218
+ },
10219
+ {
10220
+ "epoch": 0.36,
10221
+ "grad_norm": 18.25,
10222
+ "learning_rate": 1.0461016949152543e-05,
10223
+ "loss": 1.3105,
10224
+ "step": 14570
10225
+ },
10226
+ {
10227
+ "epoch": 0.36,
10228
+ "grad_norm": 29.375,
10229
+ "learning_rate": 1.0454237288135594e-05,
10230
+ "loss": 1.4942,
10231
+ "step": 14580
10232
+ },
10233
+ {
10234
+ "epoch": 0.36,
10235
+ "grad_norm": 15.8125,
10236
+ "learning_rate": 1.0447457627118644e-05,
10237
+ "loss": 1.5753,
10238
+ "step": 14590
10239
+ },
10240
+ {
10241
+ "epoch": 0.36,
10242
+ "grad_norm": 27.0,
10243
+ "learning_rate": 1.0440677966101695e-05,
10244
+ "loss": 1.4115,
10245
+ "step": 14600
10246
+ },
10247
+ {
10248
+ "epoch": 0.37,
10249
+ "grad_norm": 32.25,
10250
+ "learning_rate": 1.0433898305084747e-05,
10251
+ "loss": 1.3518,
10252
+ "step": 14610
10253
+ },
10254
+ {
10255
+ "epoch": 0.37,
10256
+ "grad_norm": 29.625,
10257
+ "learning_rate": 1.0427118644067798e-05,
10258
+ "loss": 1.522,
10259
+ "step": 14620
10260
+ },
10261
+ {
10262
+ "epoch": 0.37,
10263
+ "grad_norm": 25.75,
10264
+ "learning_rate": 1.0420338983050848e-05,
10265
+ "loss": 1.341,
10266
+ "step": 14630
10267
+ },
10268
+ {
10269
+ "epoch": 0.37,
10270
+ "grad_norm": 35.25,
10271
+ "learning_rate": 1.04135593220339e-05,
10272
+ "loss": 1.458,
10273
+ "step": 14640
10274
+ },
10275
+ {
10276
+ "epoch": 0.37,
10277
+ "grad_norm": 9.5625,
10278
+ "learning_rate": 1.040677966101695e-05,
10279
+ "loss": 1.4959,
10280
+ "step": 14650
10281
+ },
10282
+ {
10283
+ "epoch": 0.37,
10284
+ "grad_norm": 30.0,
10285
+ "learning_rate": 1.04e-05,
10286
+ "loss": 1.3038,
10287
+ "step": 14660
10288
+ },
10289
+ {
10290
+ "epoch": 0.37,
10291
+ "grad_norm": 29.375,
10292
+ "learning_rate": 1.0393220338983052e-05,
10293
+ "loss": 1.1989,
10294
+ "step": 14670
10295
+ },
10296
+ {
10297
+ "epoch": 0.37,
10298
+ "grad_norm": 11.3125,
10299
+ "learning_rate": 1.0386440677966103e-05,
10300
+ "loss": 1.4619,
10301
+ "step": 14680
10302
+ },
10303
+ {
10304
+ "epoch": 0.37,
10305
+ "grad_norm": 29.75,
10306
+ "learning_rate": 1.0379661016949155e-05,
10307
+ "loss": 1.386,
10308
+ "step": 14690
10309
+ },
10310
+ {
10311
+ "epoch": 0.37,
10312
+ "grad_norm": 20.0,
10313
+ "learning_rate": 1.0372881355932204e-05,
10314
+ "loss": 1.339,
10315
+ "step": 14700
10316
+ },
10317
+ {
10318
+ "epoch": 0.37,
10319
+ "grad_norm": 25.0,
10320
+ "learning_rate": 1.0366101694915256e-05,
10321
+ "loss": 1.3886,
10322
+ "step": 14710
10323
+ },
10324
+ {
10325
+ "epoch": 0.37,
10326
+ "grad_norm": 24.875,
10327
+ "learning_rate": 1.0359322033898307e-05,
10328
+ "loss": 1.364,
10329
+ "step": 14720
10330
+ },
10331
+ {
10332
+ "epoch": 0.37,
10333
+ "grad_norm": 26.0,
10334
+ "learning_rate": 1.0352542372881358e-05,
10335
+ "loss": 1.416,
10336
+ "step": 14730
10337
+ },
10338
+ {
10339
+ "epoch": 0.37,
10340
+ "grad_norm": 118.0,
10341
+ "learning_rate": 1.0345762711864406e-05,
10342
+ "loss": 1.3624,
10343
+ "step": 14740
10344
+ },
10345
+ {
10346
+ "epoch": 0.37,
10347
+ "grad_norm": 14.375,
10348
+ "learning_rate": 1.0338983050847458e-05,
10349
+ "loss": 1.346,
10350
+ "step": 14750
10351
+ },
10352
+ {
10353
+ "epoch": 0.37,
10354
+ "grad_norm": 19.5,
10355
+ "learning_rate": 1.033220338983051e-05,
10356
+ "loss": 1.4017,
10357
+ "step": 14760
10358
+ },
10359
+ {
10360
+ "epoch": 0.37,
10361
+ "grad_norm": 14.625,
10362
+ "learning_rate": 1.0325423728813559e-05,
10363
+ "loss": 1.4782,
10364
+ "step": 14770
10365
+ },
10366
+ {
10367
+ "epoch": 0.37,
10368
+ "grad_norm": 18.125,
10369
+ "learning_rate": 1.031864406779661e-05,
10370
+ "loss": 1.3452,
10371
+ "step": 14780
10372
+ },
10373
+ {
10374
+ "epoch": 0.37,
10375
+ "grad_norm": 8.6875,
10376
+ "learning_rate": 1.0311864406779662e-05,
10377
+ "loss": 1.3968,
10378
+ "step": 14790
10379
+ },
10380
+ {
10381
+ "epoch": 0.37,
10382
+ "grad_norm": 29.25,
10383
+ "learning_rate": 1.0305084745762712e-05,
10384
+ "loss": 1.3137,
10385
+ "step": 14800
10386
+ },
10387
+ {
10388
+ "epoch": 0.37,
10389
+ "grad_norm": 20.75,
10390
+ "learning_rate": 1.0298305084745763e-05,
10391
+ "loss": 1.4651,
10392
+ "step": 14810
10393
+ },
10394
+ {
10395
+ "epoch": 0.37,
10396
+ "grad_norm": 12.4375,
10397
+ "learning_rate": 1.0291525423728814e-05,
10398
+ "loss": 1.468,
10399
+ "step": 14820
10400
+ },
10401
+ {
10402
+ "epoch": 0.37,
10403
+ "grad_norm": 19.375,
10404
+ "learning_rate": 1.0284745762711866e-05,
10405
+ "loss": 1.5157,
10406
+ "step": 14830
10407
+ },
10408
+ {
10409
+ "epoch": 0.37,
10410
+ "grad_norm": 39.25,
10411
+ "learning_rate": 1.0277966101694915e-05,
10412
+ "loss": 1.4106,
10413
+ "step": 14840
10414
+ },
10415
+ {
10416
+ "epoch": 0.37,
10417
+ "grad_norm": 24.25,
10418
+ "learning_rate": 1.0271186440677967e-05,
10419
+ "loss": 1.3044,
10420
+ "step": 14850
10421
+ },
10422
+ {
10423
+ "epoch": 0.37,
10424
+ "grad_norm": 7.25,
10425
+ "learning_rate": 1.0264406779661018e-05,
10426
+ "loss": 1.3231,
10427
+ "step": 14860
10428
+ },
10429
+ {
10430
+ "epoch": 0.37,
10431
+ "grad_norm": 26.0,
10432
+ "learning_rate": 1.025762711864407e-05,
10433
+ "loss": 1.2322,
10434
+ "step": 14870
10435
+ },
10436
+ {
10437
+ "epoch": 0.37,
10438
+ "grad_norm": 16.625,
10439
+ "learning_rate": 1.025084745762712e-05,
10440
+ "loss": 1.1767,
10441
+ "step": 14880
10442
+ },
10443
+ {
10444
+ "epoch": 0.37,
10445
+ "grad_norm": 11.625,
10446
+ "learning_rate": 1.024406779661017e-05,
10447
+ "loss": 1.3503,
10448
+ "step": 14890
10449
+ },
10450
+ {
10451
+ "epoch": 0.37,
10452
+ "grad_norm": 9.6875,
10453
+ "learning_rate": 1.0237288135593222e-05,
10454
+ "loss": 1.3777,
10455
+ "step": 14900
10456
+ },
10457
+ {
10458
+ "epoch": 0.37,
10459
+ "grad_norm": 46.75,
10460
+ "learning_rate": 1.0230508474576274e-05,
10461
+ "loss": 1.3334,
10462
+ "step": 14910
10463
+ },
10464
+ {
10465
+ "epoch": 0.37,
10466
+ "grad_norm": 23.25,
10467
+ "learning_rate": 1.0223728813559323e-05,
10468
+ "loss": 1.2998,
10469
+ "step": 14920
10470
+ },
10471
+ {
10472
+ "epoch": 0.37,
10473
+ "grad_norm": 45.75,
10474
+ "learning_rate": 1.0216949152542375e-05,
10475
+ "loss": 1.3287,
10476
+ "step": 14930
10477
+ },
10478
+ {
10479
+ "epoch": 0.37,
10480
+ "grad_norm": 20.0,
10481
+ "learning_rate": 1.0210169491525423e-05,
10482
+ "loss": 1.4629,
10483
+ "step": 14940
10484
+ },
10485
+ {
10486
+ "epoch": 0.37,
10487
+ "grad_norm": 9.375,
10488
+ "learning_rate": 1.0203389830508474e-05,
10489
+ "loss": 1.2114,
10490
+ "step": 14950
10491
+ },
10492
+ {
10493
+ "epoch": 0.37,
10494
+ "grad_norm": 35.75,
10495
+ "learning_rate": 1.0196610169491525e-05,
10496
+ "loss": 1.4157,
10497
+ "step": 14960
10498
+ },
10499
+ {
10500
+ "epoch": 0.37,
10501
+ "grad_norm": 19.375,
10502
+ "learning_rate": 1.0189830508474577e-05,
10503
+ "loss": 1.3757,
10504
+ "step": 14970
10505
+ },
10506
+ {
10507
+ "epoch": 0.37,
10508
+ "grad_norm": 16.75,
10509
+ "learning_rate": 1.0183050847457627e-05,
10510
+ "loss": 1.4091,
10511
+ "step": 14980
10512
+ },
10513
+ {
10514
+ "epoch": 0.37,
10515
+ "grad_norm": 12.0,
10516
+ "learning_rate": 1.0176271186440678e-05,
10517
+ "loss": 1.4074,
10518
+ "step": 14990
10519
+ },
10520
+ {
10521
+ "epoch": 0.38,
10522
+ "grad_norm": 20.25,
10523
+ "learning_rate": 1.016949152542373e-05,
10524
+ "loss": 1.2817,
10525
+ "step": 15000
10526
+ },
10527
+ {
10528
+ "epoch": 0.38,
10529
+ "eval_loss": 1.3379485607147217,
10530
+ "eval_runtime": 59.3062,
10531
+ "eval_samples_per_second": 16.862,
10532
+ "eval_steps_per_second": 16.862,
10533
+ "step": 15000
10534
  }
10535
  ],
10536
  "logging_steps": 10,
 
10538
  "num_input_tokens_seen": 0,
10539
  "num_train_epochs": 1,
10540
  "save_steps": 5000,
10541
+ "total_flos": 2.3600547495936e+17,
10542
  "train_batch_size": 1,
10543
  "trial_name": null,
10544
  "trial_params": null