luciagil commited on
Commit
bd3b281
1 Parent(s): 1e43038

🍻 cheers

Browse files
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224-in21K
4
  tags:
 
5
  - generated_from_trainer
6
  metrics:
7
  - accuracy
@@ -17,8 +18,8 @@ should probably proofread and complete it, then remove this comment. -->
17
 
18
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21K](https://huggingface.co/google/vit-base-patch16-224-in21K) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 0.5316
21
- - Accuracy: 0.8492
22
 
23
  ## Model description
24
 
 
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224-in21K
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  metrics:
8
  - accuracy
 
18
 
19
  This model is a fine-tuned version of [google/vit-base-patch16-224-in21K](https://huggingface.co/google/vit-base-patch16-224-in21K) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.5303
22
+ - Accuracy: 0.8496
23
 
24
  ## Model description
25
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 7.0,
3
+ "eval_accuracy": 0.8496031746031746,
4
+ "eval_loss": 0.5302808284759521,
5
+ "eval_runtime": 93.45,
6
+ "eval_samples_per_second": 26.966,
7
+ "eval_steps_per_second": 3.371,
8
+ "total_flos": 5.468471871363809e+18,
9
+ "train_loss": 0.7062997149772384,
10
+ "train_runtime": 5016.6762,
11
+ "train_samples_per_second": 14.065,
12
+ "train_steps_per_second": 0.879
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 7.0,
3
+ "eval_accuracy": 0.8496031746031746,
4
+ "eval_loss": 0.5302808284759521,
5
+ "eval_runtime": 93.45,
6
+ "eval_samples_per_second": 26.966,
7
+ "eval_steps_per_second": 3.371
8
+ }
runs/Apr05_17-06-15_44569c9ac571/events.out.tfevents.1712341973.44569c9ac571.8281.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6968344f66e6c5529f1bdaec7dfca0125d548b0aaf5981bc6ba7f3886c0e54d
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 7.0,
3
+ "total_flos": 5.468471871363809e+18,
4
+ "train_loss": 0.7062997149772384,
5
+ "train_runtime": 5016.6762,
6
+ "train_samples_per_second": 14.065,
7
+ "train_steps_per_second": 0.879
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,3513 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5302808284759521,
3
+ "best_model_checkpoint": "Human_action_classifier/checkpoint-4300",
4
+ "epoch": 7.0,
5
+ "eval_steps": 100,
6
+ "global_step": 4410,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.02,
13
+ "grad_norm": 1.9168592691421509,
14
+ "learning_rate": 0.00019954648526077098,
15
+ "loss": 2.6612,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.03,
20
+ "grad_norm": 2.498575210571289,
21
+ "learning_rate": 0.00019909297052154195,
22
+ "loss": 2.4529,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.05,
27
+ "grad_norm": 2.4278714656829834,
28
+ "learning_rate": 0.00019863945578231293,
29
+ "loss": 2.3062,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.06,
34
+ "grad_norm": 2.5514345169067383,
35
+ "learning_rate": 0.0001981859410430839,
36
+ "loss": 2.169,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.08,
41
+ "grad_norm": 2.1051814556121826,
42
+ "learning_rate": 0.0001977324263038549,
43
+ "loss": 1.9571,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.1,
48
+ "grad_norm": 2.5904359817504883,
49
+ "learning_rate": 0.00019727891156462587,
50
+ "loss": 1.7926,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.11,
55
+ "grad_norm": 2.9325740337371826,
56
+ "learning_rate": 0.00019682539682539682,
57
+ "loss": 1.6761,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.13,
62
+ "grad_norm": 3.636162519454956,
63
+ "learning_rate": 0.00019637188208616781,
64
+ "loss": 1.6388,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.14,
69
+ "grad_norm": 3.7232675552368164,
70
+ "learning_rate": 0.0001959183673469388,
71
+ "loss": 1.527,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.16,
76
+ "grad_norm": 3.467459201812744,
77
+ "learning_rate": 0.00019546485260770976,
78
+ "loss": 1.4545,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.16,
83
+ "eval_accuracy": 0.6706349206349206,
84
+ "eval_loss": 1.3145380020141602,
85
+ "eval_runtime": 93.6176,
86
+ "eval_samples_per_second": 26.918,
87
+ "eval_steps_per_second": 3.365,
88
+ "step": 100
89
+ },
90
+ {
91
+ "epoch": 0.17,
92
+ "grad_norm": 2.4409241676330566,
93
+ "learning_rate": 0.00019501133786848073,
94
+ "loss": 1.331,
95
+ "step": 110
96
+ },
97
+ {
98
+ "epoch": 0.19,
99
+ "grad_norm": 3.841000556945801,
100
+ "learning_rate": 0.0001945578231292517,
101
+ "loss": 1.4273,
102
+ "step": 120
103
+ },
104
+ {
105
+ "epoch": 0.21,
106
+ "grad_norm": 4.918443202972412,
107
+ "learning_rate": 0.0001941043083900227,
108
+ "loss": 1.4078,
109
+ "step": 130
110
+ },
111
+ {
112
+ "epoch": 0.22,
113
+ "grad_norm": 3.317399740219116,
114
+ "learning_rate": 0.00019365079365079365,
115
+ "loss": 1.3905,
116
+ "step": 140
117
+ },
118
+ {
119
+ "epoch": 0.24,
120
+ "grad_norm": 5.5322418212890625,
121
+ "learning_rate": 0.00019319727891156462,
122
+ "loss": 1.5453,
123
+ "step": 150
124
+ },
125
+ {
126
+ "epoch": 0.25,
127
+ "grad_norm": 4.116984844207764,
128
+ "learning_rate": 0.00019274376417233562,
129
+ "loss": 1.394,
130
+ "step": 160
131
+ },
132
+ {
133
+ "epoch": 0.27,
134
+ "grad_norm": 3.141216278076172,
135
+ "learning_rate": 0.0001922902494331066,
136
+ "loss": 1.2375,
137
+ "step": 170
138
+ },
139
+ {
140
+ "epoch": 0.29,
141
+ "grad_norm": 6.414323329925537,
142
+ "learning_rate": 0.00019183673469387756,
143
+ "loss": 1.3073,
144
+ "step": 180
145
+ },
146
+ {
147
+ "epoch": 0.3,
148
+ "grad_norm": 5.794140338897705,
149
+ "learning_rate": 0.00019138321995464854,
150
+ "loss": 1.3405,
151
+ "step": 190
152
+ },
153
+ {
154
+ "epoch": 0.32,
155
+ "grad_norm": 4.384269714355469,
156
+ "learning_rate": 0.0001909297052154195,
157
+ "loss": 1.2568,
158
+ "step": 200
159
+ },
160
+ {
161
+ "epoch": 0.32,
162
+ "eval_accuracy": 0.7178571428571429,
163
+ "eval_loss": 1.0386934280395508,
164
+ "eval_runtime": 94.8513,
165
+ "eval_samples_per_second": 26.568,
166
+ "eval_steps_per_second": 3.321,
167
+ "step": 200
168
+ },
169
+ {
170
+ "epoch": 0.33,
171
+ "grad_norm": 2.881287097930908,
172
+ "learning_rate": 0.00019047619047619048,
173
+ "loss": 1.2329,
174
+ "step": 210
175
+ },
176
+ {
177
+ "epoch": 0.35,
178
+ "grad_norm": 4.77938985824585,
179
+ "learning_rate": 0.00019002267573696145,
180
+ "loss": 1.1439,
181
+ "step": 220
182
+ },
183
+ {
184
+ "epoch": 0.37,
185
+ "grad_norm": 5.865462303161621,
186
+ "learning_rate": 0.00018956916099773243,
187
+ "loss": 1.2314,
188
+ "step": 230
189
+ },
190
+ {
191
+ "epoch": 0.38,
192
+ "grad_norm": 3.464250326156616,
193
+ "learning_rate": 0.00018911564625850343,
194
+ "loss": 1.329,
195
+ "step": 240
196
+ },
197
+ {
198
+ "epoch": 0.4,
199
+ "grad_norm": 3.2501161098480225,
200
+ "learning_rate": 0.0001886621315192744,
201
+ "loss": 1.3277,
202
+ "step": 250
203
+ },
204
+ {
205
+ "epoch": 0.41,
206
+ "grad_norm": 5.26331901550293,
207
+ "learning_rate": 0.00018820861678004534,
208
+ "loss": 1.2876,
209
+ "step": 260
210
+ },
211
+ {
212
+ "epoch": 0.43,
213
+ "grad_norm": 2.62630295753479,
214
+ "learning_rate": 0.00018775510204081634,
215
+ "loss": 1.1428,
216
+ "step": 270
217
+ },
218
+ {
219
+ "epoch": 0.44,
220
+ "grad_norm": 6.121326923370361,
221
+ "learning_rate": 0.00018730158730158731,
222
+ "loss": 1.2277,
223
+ "step": 280
224
+ },
225
+ {
226
+ "epoch": 0.46,
227
+ "grad_norm": 5.415524005889893,
228
+ "learning_rate": 0.0001868480725623583,
229
+ "loss": 1.1641,
230
+ "step": 290
231
+ },
232
+ {
233
+ "epoch": 0.48,
234
+ "grad_norm": 7.849498271942139,
235
+ "learning_rate": 0.00018639455782312926,
236
+ "loss": 1.3145,
237
+ "step": 300
238
+ },
239
+ {
240
+ "epoch": 0.48,
241
+ "eval_accuracy": 0.7134920634920635,
242
+ "eval_loss": 1.0026524066925049,
243
+ "eval_runtime": 89.5917,
244
+ "eval_samples_per_second": 28.128,
245
+ "eval_steps_per_second": 3.516,
246
+ "step": 300
247
+ },
248
+ {
249
+ "epoch": 0.49,
250
+ "grad_norm": 3.9149835109710693,
251
+ "learning_rate": 0.00018594104308390023,
252
+ "loss": 1.1359,
253
+ "step": 310
254
+ },
255
+ {
256
+ "epoch": 0.51,
257
+ "grad_norm": 4.712550163269043,
258
+ "learning_rate": 0.0001854875283446712,
259
+ "loss": 1.1196,
260
+ "step": 320
261
+ },
262
+ {
263
+ "epoch": 0.52,
264
+ "grad_norm": 4.665726184844971,
265
+ "learning_rate": 0.0001850340136054422,
266
+ "loss": 1.1178,
267
+ "step": 330
268
+ },
269
+ {
270
+ "epoch": 0.54,
271
+ "grad_norm": 4.421199798583984,
272
+ "learning_rate": 0.00018458049886621315,
273
+ "loss": 1.093,
274
+ "step": 340
275
+ },
276
+ {
277
+ "epoch": 0.56,
278
+ "grad_norm": 5.4558305740356445,
279
+ "learning_rate": 0.00018412698412698412,
280
+ "loss": 1.0638,
281
+ "step": 350
282
+ },
283
+ {
284
+ "epoch": 0.57,
285
+ "grad_norm": 6.7150559425354,
286
+ "learning_rate": 0.00018367346938775512,
287
+ "loss": 1.21,
288
+ "step": 360
289
+ },
290
+ {
291
+ "epoch": 0.59,
292
+ "grad_norm": 4.524730205535889,
293
+ "learning_rate": 0.0001832199546485261,
294
+ "loss": 1.199,
295
+ "step": 370
296
+ },
297
+ {
298
+ "epoch": 0.6,
299
+ "grad_norm": 5.457512855529785,
300
+ "learning_rate": 0.00018276643990929706,
301
+ "loss": 1.202,
302
+ "step": 380
303
+ },
304
+ {
305
+ "epoch": 0.62,
306
+ "grad_norm": 5.54859733581543,
307
+ "learning_rate": 0.00018231292517006804,
308
+ "loss": 1.1671,
309
+ "step": 390
310
+ },
311
+ {
312
+ "epoch": 0.63,
313
+ "grad_norm": 3.217099905014038,
314
+ "learning_rate": 0.000181859410430839,
315
+ "loss": 1.0866,
316
+ "step": 400
317
+ },
318
+ {
319
+ "epoch": 0.63,
320
+ "eval_accuracy": 0.7376984126984127,
321
+ "eval_loss": 0.8882665038108826,
322
+ "eval_runtime": 94.504,
323
+ "eval_samples_per_second": 26.666,
324
+ "eval_steps_per_second": 3.333,
325
+ "step": 400
326
+ },
327
+ {
328
+ "epoch": 0.65,
329
+ "grad_norm": 5.335367202758789,
330
+ "learning_rate": 0.00018140589569161,
331
+ "loss": 1.0176,
332
+ "step": 410
333
+ },
334
+ {
335
+ "epoch": 0.67,
336
+ "grad_norm": 7.274291515350342,
337
+ "learning_rate": 0.00018099773242630387,
338
+ "loss": 1.2467,
339
+ "step": 420
340
+ },
341
+ {
342
+ "epoch": 0.68,
343
+ "grad_norm": 5.159976959228516,
344
+ "learning_rate": 0.00018054421768707484,
345
+ "loss": 1.127,
346
+ "step": 430
347
+ },
348
+ {
349
+ "epoch": 0.7,
350
+ "grad_norm": 5.351009845733643,
351
+ "learning_rate": 0.0001800907029478458,
352
+ "loss": 1.2382,
353
+ "step": 440
354
+ },
355
+ {
356
+ "epoch": 0.71,
357
+ "grad_norm": 4.435301303863525,
358
+ "learning_rate": 0.00017963718820861678,
359
+ "loss": 1.061,
360
+ "step": 450
361
+ },
362
+ {
363
+ "epoch": 0.73,
364
+ "grad_norm": 4.027517795562744,
365
+ "learning_rate": 0.00017918367346938776,
366
+ "loss": 1.0432,
367
+ "step": 460
368
+ },
369
+ {
370
+ "epoch": 0.75,
371
+ "grad_norm": 4.309961795806885,
372
+ "learning_rate": 0.00017873015873015876,
373
+ "loss": 0.9949,
374
+ "step": 470
375
+ },
376
+ {
377
+ "epoch": 0.76,
378
+ "grad_norm": 4.835757732391357,
379
+ "learning_rate": 0.00017827664399092973,
380
+ "loss": 1.1899,
381
+ "step": 480
382
+ },
383
+ {
384
+ "epoch": 0.78,
385
+ "grad_norm": 5.26571798324585,
386
+ "learning_rate": 0.00017782312925170067,
387
+ "loss": 1.1235,
388
+ "step": 490
389
+ },
390
+ {
391
+ "epoch": 0.79,
392
+ "grad_norm": 3.8406684398651123,
393
+ "learning_rate": 0.00017736961451247167,
394
+ "loss": 1.0036,
395
+ "step": 500
396
+ },
397
+ {
398
+ "epoch": 0.79,
399
+ "eval_accuracy": 0.7321428571428571,
400
+ "eval_loss": 0.897292971611023,
401
+ "eval_runtime": 96.2225,
402
+ "eval_samples_per_second": 26.189,
403
+ "eval_steps_per_second": 3.274,
404
+ "step": 500
405
+ },
406
+ {
407
+ "epoch": 0.81,
408
+ "grad_norm": 4.993399143218994,
409
+ "learning_rate": 0.00017691609977324264,
410
+ "loss": 0.8682,
411
+ "step": 510
412
+ },
413
+ {
414
+ "epoch": 0.83,
415
+ "grad_norm": 4.4026875495910645,
416
+ "learning_rate": 0.00017646258503401362,
417
+ "loss": 1.0603,
418
+ "step": 520
419
+ },
420
+ {
421
+ "epoch": 0.84,
422
+ "grad_norm": 4.832537651062012,
423
+ "learning_rate": 0.0001760090702947846,
424
+ "loss": 1.034,
425
+ "step": 530
426
+ },
427
+ {
428
+ "epoch": 0.86,
429
+ "grad_norm": 5.253382682800293,
430
+ "learning_rate": 0.00017555555555555556,
431
+ "loss": 0.9938,
432
+ "step": 540
433
+ },
434
+ {
435
+ "epoch": 0.87,
436
+ "grad_norm": 3.681997060775757,
437
+ "learning_rate": 0.00017510204081632653,
438
+ "loss": 1.0082,
439
+ "step": 550
440
+ },
441
+ {
442
+ "epoch": 0.89,
443
+ "grad_norm": 6.324045181274414,
444
+ "learning_rate": 0.0001746485260770975,
445
+ "loss": 0.9629,
446
+ "step": 560
447
+ },
448
+ {
449
+ "epoch": 0.9,
450
+ "grad_norm": 5.0340352058410645,
451
+ "learning_rate": 0.00017419501133786848,
452
+ "loss": 1.1184,
453
+ "step": 570
454
+ },
455
+ {
456
+ "epoch": 0.92,
457
+ "grad_norm": 3.532378673553467,
458
+ "learning_rate": 0.00017374149659863948,
459
+ "loss": 0.9167,
460
+ "step": 580
461
+ },
462
+ {
463
+ "epoch": 0.94,
464
+ "grad_norm": 5.029895305633545,
465
+ "learning_rate": 0.00017328798185941045,
466
+ "loss": 1.1213,
467
+ "step": 590
468
+ },
469
+ {
470
+ "epoch": 0.95,
471
+ "grad_norm": 4.585740566253662,
472
+ "learning_rate": 0.00017283446712018142,
473
+ "loss": 1.1811,
474
+ "step": 600
475
+ },
476
+ {
477
+ "epoch": 0.95,
478
+ "eval_accuracy": 0.7571428571428571,
479
+ "eval_loss": 0.8048315644264221,
480
+ "eval_runtime": 95.9399,
481
+ "eval_samples_per_second": 26.266,
482
+ "eval_steps_per_second": 3.283,
483
+ "step": 600
484
+ },
485
+ {
486
+ "epoch": 0.97,
487
+ "grad_norm": 2.9866397380828857,
488
+ "learning_rate": 0.0001723809523809524,
489
+ "loss": 1.2417,
490
+ "step": 610
491
+ },
492
+ {
493
+ "epoch": 0.98,
494
+ "grad_norm": 4.54673433303833,
495
+ "learning_rate": 0.00017192743764172337,
496
+ "loss": 1.1588,
497
+ "step": 620
498
+ },
499
+ {
500
+ "epoch": 1.0,
501
+ "grad_norm": 4.411011695861816,
502
+ "learning_rate": 0.00017147392290249434,
503
+ "loss": 0.9512,
504
+ "step": 630
505
+ },
506
+ {
507
+ "epoch": 1.02,
508
+ "grad_norm": 4.393170356750488,
509
+ "learning_rate": 0.0001710204081632653,
510
+ "loss": 0.9679,
511
+ "step": 640
512
+ },
513
+ {
514
+ "epoch": 1.03,
515
+ "grad_norm": 5.261455535888672,
516
+ "learning_rate": 0.00017056689342403628,
517
+ "loss": 0.9089,
518
+ "step": 650
519
+ },
520
+ {
521
+ "epoch": 1.05,
522
+ "grad_norm": 4.93878173828125,
523
+ "learning_rate": 0.00017011337868480726,
524
+ "loss": 0.8341,
525
+ "step": 660
526
+ },
527
+ {
528
+ "epoch": 1.06,
529
+ "grad_norm": 4.5593156814575195,
530
+ "learning_rate": 0.00016965986394557825,
531
+ "loss": 0.9952,
532
+ "step": 670
533
+ },
534
+ {
535
+ "epoch": 1.08,
536
+ "grad_norm": 3.9355390071868896,
537
+ "learning_rate": 0.0001692063492063492,
538
+ "loss": 0.9956,
539
+ "step": 680
540
+ },
541
+ {
542
+ "epoch": 1.1,
543
+ "grad_norm": 4.096402645111084,
544
+ "learning_rate": 0.00016875283446712017,
545
+ "loss": 0.9566,
546
+ "step": 690
547
+ },
548
+ {
549
+ "epoch": 1.11,
550
+ "grad_norm": 5.853695869445801,
551
+ "learning_rate": 0.00016829931972789117,
552
+ "loss": 0.9242,
553
+ "step": 700
554
+ },
555
+ {
556
+ "epoch": 1.11,
557
+ "eval_accuracy": 0.7273809523809524,
558
+ "eval_loss": 0.9095195531845093,
559
+ "eval_runtime": 92.7769,
560
+ "eval_samples_per_second": 27.162,
561
+ "eval_steps_per_second": 3.395,
562
+ "step": 700
563
+ },
564
+ {
565
+ "epoch": 1.13,
566
+ "grad_norm": 5.18225622177124,
567
+ "learning_rate": 0.00016784580498866214,
568
+ "loss": 1.1815,
569
+ "step": 710
570
+ },
571
+ {
572
+ "epoch": 1.14,
573
+ "grad_norm": 6.094394683837891,
574
+ "learning_rate": 0.00016739229024943312,
575
+ "loss": 1.0088,
576
+ "step": 720
577
+ },
578
+ {
579
+ "epoch": 1.16,
580
+ "grad_norm": 3.388333320617676,
581
+ "learning_rate": 0.0001669387755102041,
582
+ "loss": 1.0219,
583
+ "step": 730
584
+ },
585
+ {
586
+ "epoch": 1.17,
587
+ "grad_norm": 2.702335834503174,
588
+ "learning_rate": 0.00016648526077097506,
589
+ "loss": 0.995,
590
+ "step": 740
591
+ },
592
+ {
593
+ "epoch": 1.19,
594
+ "grad_norm": 6.5921735763549805,
595
+ "learning_rate": 0.00016603174603174606,
596
+ "loss": 1.0336,
597
+ "step": 750
598
+ },
599
+ {
600
+ "epoch": 1.21,
601
+ "grad_norm": 7.6583781242370605,
602
+ "learning_rate": 0.000165578231292517,
603
+ "loss": 0.8889,
604
+ "step": 760
605
+ },
606
+ {
607
+ "epoch": 1.22,
608
+ "grad_norm": 5.093008518218994,
609
+ "learning_rate": 0.00016512471655328798,
610
+ "loss": 0.8492,
611
+ "step": 770
612
+ },
613
+ {
614
+ "epoch": 1.24,
615
+ "grad_norm": 4.372059345245361,
616
+ "learning_rate": 0.00016467120181405898,
617
+ "loss": 0.8115,
618
+ "step": 780
619
+ },
620
+ {
621
+ "epoch": 1.25,
622
+ "grad_norm": 4.1783857345581055,
623
+ "learning_rate": 0.00016421768707482995,
624
+ "loss": 0.834,
625
+ "step": 790
626
+ },
627
+ {
628
+ "epoch": 1.27,
629
+ "grad_norm": 4.295966148376465,
630
+ "learning_rate": 0.0001638095238095238,
631
+ "loss": 0.9477,
632
+ "step": 800
633
+ },
634
+ {
635
+ "epoch": 1.27,
636
+ "eval_accuracy": 0.7619047619047619,
637
+ "eval_loss": 0.8036767244338989,
638
+ "eval_runtime": 89.9677,
639
+ "eval_samples_per_second": 28.01,
640
+ "eval_steps_per_second": 3.501,
641
+ "step": 800
642
+ },
643
+ {
644
+ "epoch": 1.29,
645
+ "grad_norm": 4.976074695587158,
646
+ "learning_rate": 0.0001633560090702948,
647
+ "loss": 0.8778,
648
+ "step": 810
649
+ },
650
+ {
651
+ "epoch": 1.3,
652
+ "grad_norm": 3.34804105758667,
653
+ "learning_rate": 0.00016290249433106578,
654
+ "loss": 0.9856,
655
+ "step": 820
656
+ },
657
+ {
658
+ "epoch": 1.32,
659
+ "grad_norm": 6.416792869567871,
660
+ "learning_rate": 0.00016244897959183672,
661
+ "loss": 0.9829,
662
+ "step": 830
663
+ },
664
+ {
665
+ "epoch": 1.33,
666
+ "grad_norm": 5.707892894744873,
667
+ "learning_rate": 0.00016199546485260772,
668
+ "loss": 0.9397,
669
+ "step": 840
670
+ },
671
+ {
672
+ "epoch": 1.35,
673
+ "grad_norm": 4.620942115783691,
674
+ "learning_rate": 0.0001615419501133787,
675
+ "loss": 1.078,
676
+ "step": 850
677
+ },
678
+ {
679
+ "epoch": 1.37,
680
+ "grad_norm": 4.452626705169678,
681
+ "learning_rate": 0.00016108843537414967,
682
+ "loss": 0.9531,
683
+ "step": 860
684
+ },
685
+ {
686
+ "epoch": 1.38,
687
+ "grad_norm": 5.697545528411865,
688
+ "learning_rate": 0.00016063492063492064,
689
+ "loss": 1.0438,
690
+ "step": 870
691
+ },
692
+ {
693
+ "epoch": 1.4,
694
+ "grad_norm": 4.365375518798828,
695
+ "learning_rate": 0.0001601814058956916,
696
+ "loss": 0.9024,
697
+ "step": 880
698
+ },
699
+ {
700
+ "epoch": 1.41,
701
+ "grad_norm": 5.501079082489014,
702
+ "learning_rate": 0.00015972789115646259,
703
+ "loss": 0.9091,
704
+ "step": 890
705
+ },
706
+ {
707
+ "epoch": 1.43,
708
+ "grad_norm": 4.409488201141357,
709
+ "learning_rate": 0.00015927437641723358,
710
+ "loss": 0.8634,
711
+ "step": 900
712
+ },
713
+ {
714
+ "epoch": 1.43,
715
+ "eval_accuracy": 0.7642857142857142,
716
+ "eval_loss": 0.7937940359115601,
717
+ "eval_runtime": 92.0164,
718
+ "eval_samples_per_second": 27.386,
719
+ "eval_steps_per_second": 3.423,
720
+ "step": 900
721
+ },
722
+ {
723
+ "epoch": 1.44,
724
+ "grad_norm": 4.543234825134277,
725
+ "learning_rate": 0.00015882086167800453,
726
+ "loss": 0.7896,
727
+ "step": 910
728
+ },
729
+ {
730
+ "epoch": 1.46,
731
+ "grad_norm": 3.8954508304595947,
732
+ "learning_rate": 0.0001583673469387755,
733
+ "loss": 0.9597,
734
+ "step": 920
735
+ },
736
+ {
737
+ "epoch": 1.48,
738
+ "grad_norm": 7.744637489318848,
739
+ "learning_rate": 0.0001579138321995465,
740
+ "loss": 0.9585,
741
+ "step": 930
742
+ },
743
+ {
744
+ "epoch": 1.49,
745
+ "grad_norm": 3.619973659515381,
746
+ "learning_rate": 0.00015746031746031747,
747
+ "loss": 0.826,
748
+ "step": 940
749
+ },
750
+ {
751
+ "epoch": 1.51,
752
+ "grad_norm": 10.431225776672363,
753
+ "learning_rate": 0.00015700680272108845,
754
+ "loss": 0.9651,
755
+ "step": 950
756
+ },
757
+ {
758
+ "epoch": 1.52,
759
+ "grad_norm": 4.897637367248535,
760
+ "learning_rate": 0.00015655328798185942,
761
+ "loss": 0.9136,
762
+ "step": 960
763
+ },
764
+ {
765
+ "epoch": 1.54,
766
+ "grad_norm": 2.429625988006592,
767
+ "learning_rate": 0.0001560997732426304,
768
+ "loss": 0.9676,
769
+ "step": 970
770
+ },
771
+ {
772
+ "epoch": 1.56,
773
+ "grad_norm": 2.8244125843048096,
774
+ "learning_rate": 0.00015564625850340136,
775
+ "loss": 0.8796,
776
+ "step": 980
777
+ },
778
+ {
779
+ "epoch": 1.57,
780
+ "grad_norm": 3.9079954624176025,
781
+ "learning_rate": 0.00015519274376417234,
782
+ "loss": 0.7595,
783
+ "step": 990
784
+ },
785
+ {
786
+ "epoch": 1.59,
787
+ "grad_norm": 8.262408256530762,
788
+ "learning_rate": 0.0001547392290249433,
789
+ "loss": 1.0098,
790
+ "step": 1000
791
+ },
792
+ {
793
+ "epoch": 1.59,
794
+ "eval_accuracy": 0.7765873015873016,
795
+ "eval_loss": 0.7327961325645447,
796
+ "eval_runtime": 90.6084,
797
+ "eval_samples_per_second": 27.812,
798
+ "eval_steps_per_second": 3.477,
799
+ "step": 1000
800
+ },
801
+ {
802
+ "epoch": 1.6,
803
+ "grad_norm": 5.290775299072266,
804
+ "learning_rate": 0.0001542857142857143,
805
+ "loss": 0.8282,
806
+ "step": 1010
807
+ },
808
+ {
809
+ "epoch": 1.62,
810
+ "grad_norm": 5.40395450592041,
811
+ "learning_rate": 0.00015383219954648528,
812
+ "loss": 1.0232,
813
+ "step": 1020
814
+ },
815
+ {
816
+ "epoch": 1.63,
817
+ "grad_norm": 6.36408805847168,
818
+ "learning_rate": 0.00015337868480725622,
819
+ "loss": 0.9572,
820
+ "step": 1030
821
+ },
822
+ {
823
+ "epoch": 1.65,
824
+ "grad_norm": 3.8068315982818604,
825
+ "learning_rate": 0.00015292517006802722,
826
+ "loss": 0.8457,
827
+ "step": 1040
828
+ },
829
+ {
830
+ "epoch": 1.67,
831
+ "grad_norm": 5.893327236175537,
832
+ "learning_rate": 0.0001524716553287982,
833
+ "loss": 0.915,
834
+ "step": 1050
835
+ },
836
+ {
837
+ "epoch": 1.68,
838
+ "grad_norm": 4.604687213897705,
839
+ "learning_rate": 0.00015201814058956917,
840
+ "loss": 0.9143,
841
+ "step": 1060
842
+ },
843
+ {
844
+ "epoch": 1.7,
845
+ "grad_norm": 4.973442077636719,
846
+ "learning_rate": 0.00015156462585034014,
847
+ "loss": 0.8211,
848
+ "step": 1070
849
+ },
850
+ {
851
+ "epoch": 1.71,
852
+ "grad_norm": 4.463946342468262,
853
+ "learning_rate": 0.0001511111111111111,
854
+ "loss": 0.8649,
855
+ "step": 1080
856
+ },
857
+ {
858
+ "epoch": 1.73,
859
+ "grad_norm": 4.718800067901611,
860
+ "learning_rate": 0.0001506575963718821,
861
+ "loss": 0.7986,
862
+ "step": 1090
863
+ },
864
+ {
865
+ "epoch": 1.75,
866
+ "grad_norm": 4.523365020751953,
867
+ "learning_rate": 0.00015020408163265306,
868
+ "loss": 0.8176,
869
+ "step": 1100
870
+ },
871
+ {
872
+ "epoch": 1.75,
873
+ "eval_accuracy": 0.7515873015873016,
874
+ "eval_loss": 0.8064602017402649,
875
+ "eval_runtime": 93.1844,
876
+ "eval_samples_per_second": 27.043,
877
+ "eval_steps_per_second": 3.38,
878
+ "step": 1100
879
+ },
880
+ {
881
+ "epoch": 1.76,
882
+ "grad_norm": 5.417792797088623,
883
+ "learning_rate": 0.00014975056689342403,
884
+ "loss": 1.0616,
885
+ "step": 1110
886
+ },
887
+ {
888
+ "epoch": 1.78,
889
+ "grad_norm": 6.2752275466918945,
890
+ "learning_rate": 0.00014929705215419503,
891
+ "loss": 0.9484,
892
+ "step": 1120
893
+ },
894
+ {
895
+ "epoch": 1.79,
896
+ "grad_norm": 6.395713806152344,
897
+ "learning_rate": 0.000148843537414966,
898
+ "loss": 1.0357,
899
+ "step": 1130
900
+ },
901
+ {
902
+ "epoch": 1.81,
903
+ "grad_norm": 4.707973003387451,
904
+ "learning_rate": 0.00014839002267573697,
905
+ "loss": 0.922,
906
+ "step": 1140
907
+ },
908
+ {
909
+ "epoch": 1.83,
910
+ "grad_norm": 6.663784980773926,
911
+ "learning_rate": 0.00014793650793650795,
912
+ "loss": 1.141,
913
+ "step": 1150
914
+ },
915
+ {
916
+ "epoch": 1.84,
917
+ "grad_norm": 4.041499137878418,
918
+ "learning_rate": 0.00014748299319727892,
919
+ "loss": 0.9529,
920
+ "step": 1160
921
+ },
922
+ {
923
+ "epoch": 1.86,
924
+ "grad_norm": 5.212307453155518,
925
+ "learning_rate": 0.0001470294784580499,
926
+ "loss": 0.6781,
927
+ "step": 1170
928
+ },
929
+ {
930
+ "epoch": 1.87,
931
+ "grad_norm": 5.991590976715088,
932
+ "learning_rate": 0.00014657596371882086,
933
+ "loss": 0.8701,
934
+ "step": 1180
935
+ },
936
+ {
937
+ "epoch": 1.89,
938
+ "grad_norm": 8.68984317779541,
939
+ "learning_rate": 0.00014612244897959183,
940
+ "loss": 1.0802,
941
+ "step": 1190
942
+ },
943
+ {
944
+ "epoch": 1.9,
945
+ "grad_norm": 6.660585403442383,
946
+ "learning_rate": 0.0001456689342403628,
947
+ "loss": 0.8072,
948
+ "step": 1200
949
+ },
950
+ {
951
+ "epoch": 1.9,
952
+ "eval_accuracy": 0.7694444444444445,
953
+ "eval_loss": 0.77680903673172,
954
+ "eval_runtime": 91.7003,
955
+ "eval_samples_per_second": 27.481,
956
+ "eval_steps_per_second": 3.435,
957
+ "step": 1200
958
+ },
959
+ {
960
+ "epoch": 1.92,
961
+ "grad_norm": 3.435088872909546,
962
+ "learning_rate": 0.0001452154195011338,
963
+ "loss": 1.0272,
964
+ "step": 1210
965
+ },
966
+ {
967
+ "epoch": 1.94,
968
+ "grad_norm": 3.200514078140259,
969
+ "learning_rate": 0.00014476190476190475,
970
+ "loss": 1.0246,
971
+ "step": 1220
972
+ },
973
+ {
974
+ "epoch": 1.95,
975
+ "grad_norm": 3.882340669631958,
976
+ "learning_rate": 0.00014430839002267575,
977
+ "loss": 0.7745,
978
+ "step": 1230
979
+ },
980
+ {
981
+ "epoch": 1.97,
982
+ "grad_norm": 3.1602838039398193,
983
+ "learning_rate": 0.00014385487528344672,
984
+ "loss": 0.9499,
985
+ "step": 1240
986
+ },
987
+ {
988
+ "epoch": 1.98,
989
+ "grad_norm": 2.896543502807617,
990
+ "learning_rate": 0.0001434013605442177,
991
+ "loss": 0.9155,
992
+ "step": 1250
993
+ },
994
+ {
995
+ "epoch": 2.0,
996
+ "grad_norm": 4.671875476837158,
997
+ "learning_rate": 0.00014294784580498867,
998
+ "loss": 0.8183,
999
+ "step": 1260
1000
+ },
1001
+ {
1002
+ "epoch": 2.02,
1003
+ "grad_norm": 6.149994373321533,
1004
+ "learning_rate": 0.00014249433106575964,
1005
+ "loss": 0.8043,
1006
+ "step": 1270
1007
+ },
1008
+ {
1009
+ "epoch": 2.03,
1010
+ "grad_norm": 4.373509407043457,
1011
+ "learning_rate": 0.0001420408163265306,
1012
+ "loss": 0.7205,
1013
+ "step": 1280
1014
+ },
1015
+ {
1016
+ "epoch": 2.05,
1017
+ "grad_norm": 5.70250940322876,
1018
+ "learning_rate": 0.0001415873015873016,
1019
+ "loss": 0.8109,
1020
+ "step": 1290
1021
+ },
1022
+ {
1023
+ "epoch": 2.06,
1024
+ "grad_norm": 4.3769683837890625,
1025
+ "learning_rate": 0.00014113378684807256,
1026
+ "loss": 0.7739,
1027
+ "step": 1300
1028
+ },
1029
+ {
1030
+ "epoch": 2.06,
1031
+ "eval_accuracy": 0.7726190476190476,
1032
+ "eval_loss": 0.7623938322067261,
1033
+ "eval_runtime": 108.4142,
1034
+ "eval_samples_per_second": 23.244,
1035
+ "eval_steps_per_second": 2.906,
1036
+ "step": 1300
1037
+ },
1038
+ {
1039
+ "epoch": 2.08,
1040
+ "grad_norm": 8.936156272888184,
1041
+ "learning_rate": 0.00014068027210884353,
1042
+ "loss": 0.7361,
1043
+ "step": 1310
1044
+ },
1045
+ {
1046
+ "epoch": 2.1,
1047
+ "grad_norm": 6.855538845062256,
1048
+ "learning_rate": 0.00014022675736961453,
1049
+ "loss": 0.7778,
1050
+ "step": 1320
1051
+ },
1052
+ {
1053
+ "epoch": 2.11,
1054
+ "grad_norm": 2.5195322036743164,
1055
+ "learning_rate": 0.0001397732426303855,
1056
+ "loss": 0.7778,
1057
+ "step": 1330
1058
+ },
1059
+ {
1060
+ "epoch": 2.13,
1061
+ "grad_norm": 4.916295051574707,
1062
+ "learning_rate": 0.00013931972789115645,
1063
+ "loss": 0.8522,
1064
+ "step": 1340
1065
+ },
1066
+ {
1067
+ "epoch": 2.14,
1068
+ "grad_norm": 5.055403232574463,
1069
+ "learning_rate": 0.00013886621315192745,
1070
+ "loss": 0.6851,
1071
+ "step": 1350
1072
+ },
1073
+ {
1074
+ "epoch": 2.16,
1075
+ "grad_norm": 5.334274768829346,
1076
+ "learning_rate": 0.00013841269841269842,
1077
+ "loss": 0.9657,
1078
+ "step": 1360
1079
+ },
1080
+ {
1081
+ "epoch": 2.17,
1082
+ "grad_norm": 6.083943843841553,
1083
+ "learning_rate": 0.00013795918367346942,
1084
+ "loss": 0.6776,
1085
+ "step": 1370
1086
+ },
1087
+ {
1088
+ "epoch": 2.19,
1089
+ "grad_norm": 4.362452983856201,
1090
+ "learning_rate": 0.00013750566893424036,
1091
+ "loss": 0.7703,
1092
+ "step": 1380
1093
+ },
1094
+ {
1095
+ "epoch": 2.21,
1096
+ "grad_norm": 6.978400707244873,
1097
+ "learning_rate": 0.00013705215419501133,
1098
+ "loss": 0.7911,
1099
+ "step": 1390
1100
+ },
1101
+ {
1102
+ "epoch": 2.22,
1103
+ "grad_norm": 4.561004638671875,
1104
+ "learning_rate": 0.00013659863945578233,
1105
+ "loss": 0.6851,
1106
+ "step": 1400
1107
+ },
1108
+ {
1109
+ "epoch": 2.22,
1110
+ "eval_accuracy": 0.794047619047619,
1111
+ "eval_loss": 0.668690025806427,
1112
+ "eval_runtime": 91.4622,
1113
+ "eval_samples_per_second": 27.552,
1114
+ "eval_steps_per_second": 3.444,
1115
+ "step": 1400
1116
+ },
1117
+ {
1118
+ "epoch": 2.24,
1119
+ "grad_norm": 4.068347930908203,
1120
+ "learning_rate": 0.0001361451247165533,
1121
+ "loss": 0.6705,
1122
+ "step": 1410
1123
+ },
1124
+ {
1125
+ "epoch": 2.25,
1126
+ "grad_norm": 4.628905773162842,
1127
+ "learning_rate": 0.00013569160997732425,
1128
+ "loss": 0.8955,
1129
+ "step": 1420
1130
+ },
1131
+ {
1132
+ "epoch": 2.27,
1133
+ "grad_norm": 6.214323997497559,
1134
+ "learning_rate": 0.00013523809523809525,
1135
+ "loss": 0.785,
1136
+ "step": 1430
1137
+ },
1138
+ {
1139
+ "epoch": 2.29,
1140
+ "grad_norm": 3.4356954097747803,
1141
+ "learning_rate": 0.00013478458049886622,
1142
+ "loss": 0.6415,
1143
+ "step": 1440
1144
+ },
1145
+ {
1146
+ "epoch": 2.3,
1147
+ "grad_norm": 6.76524019241333,
1148
+ "learning_rate": 0.0001343310657596372,
1149
+ "loss": 0.8914,
1150
+ "step": 1450
1151
+ },
1152
+ {
1153
+ "epoch": 2.32,
1154
+ "grad_norm": 4.742693901062012,
1155
+ "learning_rate": 0.00013387755102040817,
1156
+ "loss": 0.8758,
1157
+ "step": 1460
1158
+ },
1159
+ {
1160
+ "epoch": 2.33,
1161
+ "grad_norm": 3.940936326980591,
1162
+ "learning_rate": 0.00013342403628117914,
1163
+ "loss": 0.6984,
1164
+ "step": 1470
1165
+ },
1166
+ {
1167
+ "epoch": 2.35,
1168
+ "grad_norm": 7.005272388458252,
1169
+ "learning_rate": 0.0001329705215419501,
1170
+ "loss": 0.833,
1171
+ "step": 1480
1172
+ },
1173
+ {
1174
+ "epoch": 2.37,
1175
+ "grad_norm": 5.6659722328186035,
1176
+ "learning_rate": 0.0001325170068027211,
1177
+ "loss": 0.7352,
1178
+ "step": 1490
1179
+ },
1180
+ {
1181
+ "epoch": 2.38,
1182
+ "grad_norm": 4.9742045402526855,
1183
+ "learning_rate": 0.00013206349206349206,
1184
+ "loss": 0.7496,
1185
+ "step": 1500
1186
+ },
1187
+ {
1188
+ "epoch": 2.38,
1189
+ "eval_accuracy": 0.7948412698412698,
1190
+ "eval_loss": 0.6806091070175171,
1191
+ "eval_runtime": 91.1192,
1192
+ "eval_samples_per_second": 27.656,
1193
+ "eval_steps_per_second": 3.457,
1194
+ "step": 1500
1195
+ },
1196
+ {
1197
+ "epoch": 2.4,
1198
+ "grad_norm": 5.499560832977295,
1199
+ "learning_rate": 0.00013160997732426303,
1200
+ "loss": 0.8274,
1201
+ "step": 1510
1202
+ },
1203
+ {
1204
+ "epoch": 2.41,
1205
+ "grad_norm": 4.391964435577393,
1206
+ "learning_rate": 0.00013115646258503403,
1207
+ "loss": 0.7892,
1208
+ "step": 1520
1209
+ },
1210
+ {
1211
+ "epoch": 2.43,
1212
+ "grad_norm": 4.317266464233398,
1213
+ "learning_rate": 0.000130702947845805,
1214
+ "loss": 0.7462,
1215
+ "step": 1530
1216
+ },
1217
+ {
1218
+ "epoch": 2.44,
1219
+ "grad_norm": 3.7989251613616943,
1220
+ "learning_rate": 0.00013024943310657597,
1221
+ "loss": 0.8322,
1222
+ "step": 1540
1223
+ },
1224
+ {
1225
+ "epoch": 2.46,
1226
+ "grad_norm": 4.737931251525879,
1227
+ "learning_rate": 0.00012979591836734695,
1228
+ "loss": 0.9,
1229
+ "step": 1550
1230
+ },
1231
+ {
1232
+ "epoch": 2.48,
1233
+ "grad_norm": 2.4748082160949707,
1234
+ "learning_rate": 0.00012934240362811792,
1235
+ "loss": 0.7613,
1236
+ "step": 1560
1237
+ },
1238
+ {
1239
+ "epoch": 2.49,
1240
+ "grad_norm": 5.137518882751465,
1241
+ "learning_rate": 0.00012888888888888892,
1242
+ "loss": 0.8596,
1243
+ "step": 1570
1244
+ },
1245
+ {
1246
+ "epoch": 2.51,
1247
+ "grad_norm": 3.9287266731262207,
1248
+ "learning_rate": 0.00012843537414965986,
1249
+ "loss": 0.9221,
1250
+ "step": 1580
1251
+ },
1252
+ {
1253
+ "epoch": 2.52,
1254
+ "grad_norm": 3.864816665649414,
1255
+ "learning_rate": 0.00012798185941043083,
1256
+ "loss": 0.6858,
1257
+ "step": 1590
1258
+ },
1259
+ {
1260
+ "epoch": 2.54,
1261
+ "grad_norm": 3.6453895568847656,
1262
+ "learning_rate": 0.00012752834467120183,
1263
+ "loss": 0.7352,
1264
+ "step": 1600
1265
+ },
1266
+ {
1267
+ "epoch": 2.54,
1268
+ "eval_accuracy": 0.7896825396825397,
1269
+ "eval_loss": 0.6942620277404785,
1270
+ "eval_runtime": 90.1988,
1271
+ "eval_samples_per_second": 27.938,
1272
+ "eval_steps_per_second": 3.492,
1273
+ "step": 1600
1274
+ },
1275
+ {
1276
+ "epoch": 2.56,
1277
+ "grad_norm": 6.48276424407959,
1278
+ "learning_rate": 0.0001270748299319728,
1279
+ "loss": 0.7989,
1280
+ "step": 1610
1281
+ },
1282
+ {
1283
+ "epoch": 2.57,
1284
+ "grad_norm": 3.8373072147369385,
1285
+ "learning_rate": 0.00012662131519274375,
1286
+ "loss": 0.7562,
1287
+ "step": 1620
1288
+ },
1289
+ {
1290
+ "epoch": 2.59,
1291
+ "grad_norm": 5.370635986328125,
1292
+ "learning_rate": 0.00012616780045351475,
1293
+ "loss": 0.8958,
1294
+ "step": 1630
1295
+ },
1296
+ {
1297
+ "epoch": 2.6,
1298
+ "grad_norm": 2.2488410472869873,
1299
+ "learning_rate": 0.00012571428571428572,
1300
+ "loss": 0.8911,
1301
+ "step": 1640
1302
+ },
1303
+ {
1304
+ "epoch": 2.62,
1305
+ "grad_norm": 4.261588096618652,
1306
+ "learning_rate": 0.0001252607709750567,
1307
+ "loss": 0.7508,
1308
+ "step": 1650
1309
+ },
1310
+ {
1311
+ "epoch": 2.63,
1312
+ "grad_norm": 4.3199286460876465,
1313
+ "learning_rate": 0.00012480725623582767,
1314
+ "loss": 0.6107,
1315
+ "step": 1660
1316
+ },
1317
+ {
1318
+ "epoch": 2.65,
1319
+ "grad_norm": 5.610477447509766,
1320
+ "learning_rate": 0.00012435374149659864,
1321
+ "loss": 0.5786,
1322
+ "step": 1670
1323
+ },
1324
+ {
1325
+ "epoch": 2.67,
1326
+ "grad_norm": 6.289084434509277,
1327
+ "learning_rate": 0.00012390022675736964,
1328
+ "loss": 0.5845,
1329
+ "step": 1680
1330
+ },
1331
+ {
1332
+ "epoch": 2.68,
1333
+ "grad_norm": 3.4954001903533936,
1334
+ "learning_rate": 0.0001234467120181406,
1335
+ "loss": 0.8207,
1336
+ "step": 1690
1337
+ },
1338
+ {
1339
+ "epoch": 2.7,
1340
+ "grad_norm": 3.2665460109710693,
1341
+ "learning_rate": 0.00012299319727891156,
1342
+ "loss": 0.7311,
1343
+ "step": 1700
1344
+ },
1345
+ {
1346
+ "epoch": 2.7,
1347
+ "eval_accuracy": 0.7714285714285715,
1348
+ "eval_loss": 0.7353097200393677,
1349
+ "eval_runtime": 91.0228,
1350
+ "eval_samples_per_second": 27.685,
1351
+ "eval_steps_per_second": 3.461,
1352
+ "step": 1700
1353
+ },
1354
+ {
1355
+ "epoch": 2.71,
1356
+ "grad_norm": 4.803609848022461,
1357
+ "learning_rate": 0.00012253968253968256,
1358
+ "loss": 0.7369,
1359
+ "step": 1710
1360
+ },
1361
+ {
1362
+ "epoch": 2.73,
1363
+ "grad_norm": 4.8724565505981445,
1364
+ "learning_rate": 0.00012208616780045353,
1365
+ "loss": 0.623,
1366
+ "step": 1720
1367
+ },
1368
+ {
1369
+ "epoch": 2.75,
1370
+ "grad_norm": 5.754215240478516,
1371
+ "learning_rate": 0.00012163265306122449,
1372
+ "loss": 0.6377,
1373
+ "step": 1730
1374
+ },
1375
+ {
1376
+ "epoch": 2.76,
1377
+ "grad_norm": 7.614988803863525,
1378
+ "learning_rate": 0.00012117913832199547,
1379
+ "loss": 0.8485,
1380
+ "step": 1740
1381
+ },
1382
+ {
1383
+ "epoch": 2.78,
1384
+ "grad_norm": 2.6383018493652344,
1385
+ "learning_rate": 0.00012072562358276644,
1386
+ "loss": 0.6942,
1387
+ "step": 1750
1388
+ },
1389
+ {
1390
+ "epoch": 2.79,
1391
+ "grad_norm": 5.747374057769775,
1392
+ "learning_rate": 0.00012027210884353742,
1393
+ "loss": 0.7562,
1394
+ "step": 1760
1395
+ },
1396
+ {
1397
+ "epoch": 2.81,
1398
+ "grad_norm": 5.084746360778809,
1399
+ "learning_rate": 0.0001198185941043084,
1400
+ "loss": 0.69,
1401
+ "step": 1770
1402
+ },
1403
+ {
1404
+ "epoch": 2.83,
1405
+ "grad_norm": 6.089473247528076,
1406
+ "learning_rate": 0.00011936507936507938,
1407
+ "loss": 0.9495,
1408
+ "step": 1780
1409
+ },
1410
+ {
1411
+ "epoch": 2.84,
1412
+ "grad_norm": 5.547908306121826,
1413
+ "learning_rate": 0.00011891156462585033,
1414
+ "loss": 0.6505,
1415
+ "step": 1790
1416
+ },
1417
+ {
1418
+ "epoch": 2.86,
1419
+ "grad_norm": 3.5899226665496826,
1420
+ "learning_rate": 0.00011845804988662132,
1421
+ "loss": 0.7181,
1422
+ "step": 1800
1423
+ },
1424
+ {
1425
+ "epoch": 2.86,
1426
+ "eval_accuracy": 0.792063492063492,
1427
+ "eval_loss": 0.6831231713294983,
1428
+ "eval_runtime": 91.1983,
1429
+ "eval_samples_per_second": 27.632,
1430
+ "eval_steps_per_second": 3.454,
1431
+ "step": 1800
1432
+ },
1433
+ {
1434
+ "epoch": 2.87,
1435
+ "grad_norm": 4.794887542724609,
1436
+ "learning_rate": 0.00011800453514739229,
1437
+ "loss": 0.6245,
1438
+ "step": 1810
1439
+ },
1440
+ {
1441
+ "epoch": 2.89,
1442
+ "grad_norm": 7.641270160675049,
1443
+ "learning_rate": 0.00011755102040816328,
1444
+ "loss": 0.8966,
1445
+ "step": 1820
1446
+ },
1447
+ {
1448
+ "epoch": 2.9,
1449
+ "grad_norm": 4.0006608963012695,
1450
+ "learning_rate": 0.00011709750566893425,
1451
+ "loss": 0.6106,
1452
+ "step": 1830
1453
+ },
1454
+ {
1455
+ "epoch": 2.92,
1456
+ "grad_norm": 5.983084201812744,
1457
+ "learning_rate": 0.00011664399092970522,
1458
+ "loss": 0.887,
1459
+ "step": 1840
1460
+ },
1461
+ {
1462
+ "epoch": 2.94,
1463
+ "grad_norm": 4.492617130279541,
1464
+ "learning_rate": 0.00011619047619047621,
1465
+ "loss": 0.7974,
1466
+ "step": 1850
1467
+ },
1468
+ {
1469
+ "epoch": 2.95,
1470
+ "grad_norm": 4.350939750671387,
1471
+ "learning_rate": 0.00011573696145124717,
1472
+ "loss": 0.8817,
1473
+ "step": 1860
1474
+ },
1475
+ {
1476
+ "epoch": 2.97,
1477
+ "grad_norm": 4.855531692504883,
1478
+ "learning_rate": 0.00011528344671201814,
1479
+ "loss": 0.9235,
1480
+ "step": 1870
1481
+ },
1482
+ {
1483
+ "epoch": 2.98,
1484
+ "grad_norm": 5.735949993133545,
1485
+ "learning_rate": 0.00011482993197278912,
1486
+ "loss": 0.6679,
1487
+ "step": 1880
1488
+ },
1489
+ {
1490
+ "epoch": 3.0,
1491
+ "grad_norm": 3.4668216705322266,
1492
+ "learning_rate": 0.0001143764172335601,
1493
+ "loss": 0.6783,
1494
+ "step": 1890
1495
+ },
1496
+ {
1497
+ "epoch": 3.02,
1498
+ "grad_norm": 4.096149921417236,
1499
+ "learning_rate": 0.00011392290249433107,
1500
+ "loss": 0.5986,
1501
+ "step": 1900
1502
+ },
1503
+ {
1504
+ "epoch": 3.02,
1505
+ "eval_accuracy": 0.7896825396825397,
1506
+ "eval_loss": 0.6930129528045654,
1507
+ "eval_runtime": 91.4004,
1508
+ "eval_samples_per_second": 27.571,
1509
+ "eval_steps_per_second": 3.446,
1510
+ "step": 1900
1511
+ },
1512
+ {
1513
+ "epoch": 3.03,
1514
+ "grad_norm": 3.9420833587646484,
1515
+ "learning_rate": 0.00011346938775510206,
1516
+ "loss": 0.5611,
1517
+ "step": 1910
1518
+ },
1519
+ {
1520
+ "epoch": 3.05,
1521
+ "grad_norm": 2.5048274993896484,
1522
+ "learning_rate": 0.00011301587301587301,
1523
+ "loss": 0.6619,
1524
+ "step": 1920
1525
+ },
1526
+ {
1527
+ "epoch": 3.06,
1528
+ "grad_norm": 4.899613380432129,
1529
+ "learning_rate": 0.00011256235827664399,
1530
+ "loss": 0.6889,
1531
+ "step": 1930
1532
+ },
1533
+ {
1534
+ "epoch": 3.08,
1535
+ "grad_norm": 4.282281398773193,
1536
+ "learning_rate": 0.00011210884353741497,
1537
+ "loss": 0.6182,
1538
+ "step": 1940
1539
+ },
1540
+ {
1541
+ "epoch": 3.1,
1542
+ "grad_norm": 3.1164541244506836,
1543
+ "learning_rate": 0.00011165532879818594,
1544
+ "loss": 0.6822,
1545
+ "step": 1950
1546
+ },
1547
+ {
1548
+ "epoch": 3.11,
1549
+ "grad_norm": 4.760287761688232,
1550
+ "learning_rate": 0.00011120181405895693,
1551
+ "loss": 0.6557,
1552
+ "step": 1960
1553
+ },
1554
+ {
1555
+ "epoch": 3.13,
1556
+ "grad_norm": 3.0668411254882812,
1557
+ "learning_rate": 0.0001107482993197279,
1558
+ "loss": 0.6537,
1559
+ "step": 1970
1560
+ },
1561
+ {
1562
+ "epoch": 3.14,
1563
+ "grad_norm": 4.527184009552002,
1564
+ "learning_rate": 0.00011029478458049886,
1565
+ "loss": 0.5513,
1566
+ "step": 1980
1567
+ },
1568
+ {
1569
+ "epoch": 3.16,
1570
+ "grad_norm": 2.029935598373413,
1571
+ "learning_rate": 0.00010984126984126986,
1572
+ "loss": 0.5102,
1573
+ "step": 1990
1574
+ },
1575
+ {
1576
+ "epoch": 3.17,
1577
+ "grad_norm": 4.294469833374023,
1578
+ "learning_rate": 0.00010938775510204082,
1579
+ "loss": 0.5716,
1580
+ "step": 2000
1581
+ },
1582
+ {
1583
+ "epoch": 3.17,
1584
+ "eval_accuracy": 0.8047619047619048,
1585
+ "eval_loss": 0.6684977412223816,
1586
+ "eval_runtime": 90.6284,
1587
+ "eval_samples_per_second": 27.806,
1588
+ "eval_steps_per_second": 3.476,
1589
+ "step": 2000
1590
+ },
1591
+ {
1592
+ "epoch": 3.19,
1593
+ "grad_norm": 6.498531818389893,
1594
+ "learning_rate": 0.00010893424036281179,
1595
+ "loss": 0.6862,
1596
+ "step": 2010
1597
+ },
1598
+ {
1599
+ "epoch": 3.21,
1600
+ "grad_norm": 5.384933948516846,
1601
+ "learning_rate": 0.00010848072562358278,
1602
+ "loss": 0.4793,
1603
+ "step": 2020
1604
+ },
1605
+ {
1606
+ "epoch": 3.22,
1607
+ "grad_norm": 4.124419212341309,
1608
+ "learning_rate": 0.00010802721088435375,
1609
+ "loss": 0.7055,
1610
+ "step": 2030
1611
+ },
1612
+ {
1613
+ "epoch": 3.24,
1614
+ "grad_norm": 4.198364734649658,
1615
+ "learning_rate": 0.00010757369614512471,
1616
+ "loss": 0.7231,
1617
+ "step": 2040
1618
+ },
1619
+ {
1620
+ "epoch": 3.25,
1621
+ "grad_norm": 4.410750389099121,
1622
+ "learning_rate": 0.00010712018140589571,
1623
+ "loss": 0.751,
1624
+ "step": 2050
1625
+ },
1626
+ {
1627
+ "epoch": 3.27,
1628
+ "grad_norm": 6.240994930267334,
1629
+ "learning_rate": 0.00010666666666666667,
1630
+ "loss": 0.6299,
1631
+ "step": 2060
1632
+ },
1633
+ {
1634
+ "epoch": 3.29,
1635
+ "grad_norm": 4.845404148101807,
1636
+ "learning_rate": 0.00010621315192743764,
1637
+ "loss": 0.6496,
1638
+ "step": 2070
1639
+ },
1640
+ {
1641
+ "epoch": 3.3,
1642
+ "grad_norm": 4.447664260864258,
1643
+ "learning_rate": 0.00010575963718820862,
1644
+ "loss": 0.5909,
1645
+ "step": 2080
1646
+ },
1647
+ {
1648
+ "epoch": 3.32,
1649
+ "grad_norm": 4.779788494110107,
1650
+ "learning_rate": 0.0001053061224489796,
1651
+ "loss": 0.6505,
1652
+ "step": 2090
1653
+ },
1654
+ {
1655
+ "epoch": 3.33,
1656
+ "grad_norm": 3.946617841720581,
1657
+ "learning_rate": 0.00010485260770975056,
1658
+ "loss": 0.5218,
1659
+ "step": 2100
1660
+ },
1661
+ {
1662
+ "epoch": 3.33,
1663
+ "eval_accuracy": 0.7916666666666666,
1664
+ "eval_loss": 0.7152296900749207,
1665
+ "eval_runtime": 89.3624,
1666
+ "eval_samples_per_second": 28.2,
1667
+ "eval_steps_per_second": 3.525,
1668
+ "step": 2100
1669
+ },
1670
+ {
1671
+ "epoch": 3.35,
1672
+ "grad_norm": 4.13429069519043,
1673
+ "learning_rate": 0.00010439909297052155,
1674
+ "loss": 0.6193,
1675
+ "step": 2110
1676
+ },
1677
+ {
1678
+ "epoch": 3.37,
1679
+ "grad_norm": 5.503479480743408,
1680
+ "learning_rate": 0.00010394557823129251,
1681
+ "loss": 0.6206,
1682
+ "step": 2120
1683
+ },
1684
+ {
1685
+ "epoch": 3.38,
1686
+ "grad_norm": 6.610657215118408,
1687
+ "learning_rate": 0.00010349206349206351,
1688
+ "loss": 0.7244,
1689
+ "step": 2130
1690
+ },
1691
+ {
1692
+ "epoch": 3.4,
1693
+ "grad_norm": 3.592276096343994,
1694
+ "learning_rate": 0.00010303854875283447,
1695
+ "loss": 0.502,
1696
+ "step": 2140
1697
+ },
1698
+ {
1699
+ "epoch": 3.41,
1700
+ "grad_norm": 7.070047378540039,
1701
+ "learning_rate": 0.00010258503401360544,
1702
+ "loss": 0.8749,
1703
+ "step": 2150
1704
+ },
1705
+ {
1706
+ "epoch": 3.43,
1707
+ "grad_norm": 8.58852481842041,
1708
+ "learning_rate": 0.00010213151927437643,
1709
+ "loss": 0.5832,
1710
+ "step": 2160
1711
+ },
1712
+ {
1713
+ "epoch": 3.44,
1714
+ "grad_norm": 5.416906356811523,
1715
+ "learning_rate": 0.0001016780045351474,
1716
+ "loss": 0.5901,
1717
+ "step": 2170
1718
+ },
1719
+ {
1720
+ "epoch": 3.46,
1721
+ "grad_norm": 6.415400981903076,
1722
+ "learning_rate": 0.00010122448979591836,
1723
+ "loss": 0.4842,
1724
+ "step": 2180
1725
+ },
1726
+ {
1727
+ "epoch": 3.48,
1728
+ "grad_norm": 1.809209942817688,
1729
+ "learning_rate": 0.00010077097505668936,
1730
+ "loss": 0.6591,
1731
+ "step": 2190
1732
+ },
1733
+ {
1734
+ "epoch": 3.49,
1735
+ "grad_norm": 4.30819034576416,
1736
+ "learning_rate": 0.00010031746031746032,
1737
+ "loss": 0.8469,
1738
+ "step": 2200
1739
+ },
1740
+ {
1741
+ "epoch": 3.49,
1742
+ "eval_accuracy": 0.801984126984127,
1743
+ "eval_loss": 0.6404625177383423,
1744
+ "eval_runtime": 91.7544,
1745
+ "eval_samples_per_second": 27.465,
1746
+ "eval_steps_per_second": 3.433,
1747
+ "step": 2200
1748
+ },
1749
+ {
1750
+ "epoch": 3.51,
1751
+ "grad_norm": 1.2012885808944702,
1752
+ "learning_rate": 9.98639455782313e-05,
1753
+ "loss": 0.5602,
1754
+ "step": 2210
1755
+ },
1756
+ {
1757
+ "epoch": 3.52,
1758
+ "grad_norm": 6.166879653930664,
1759
+ "learning_rate": 9.941043083900228e-05,
1760
+ "loss": 0.693,
1761
+ "step": 2220
1762
+ },
1763
+ {
1764
+ "epoch": 3.54,
1765
+ "grad_norm": 2.257598876953125,
1766
+ "learning_rate": 9.895691609977325e-05,
1767
+ "loss": 0.6969,
1768
+ "step": 2230
1769
+ },
1770
+ {
1771
+ "epoch": 3.56,
1772
+ "grad_norm": 3.866694450378418,
1773
+ "learning_rate": 9.850340136054422e-05,
1774
+ "loss": 0.7577,
1775
+ "step": 2240
1776
+ },
1777
+ {
1778
+ "epoch": 3.57,
1779
+ "grad_norm": 4.493105888366699,
1780
+ "learning_rate": 9.804988662131521e-05,
1781
+ "loss": 0.7494,
1782
+ "step": 2250
1783
+ },
1784
+ {
1785
+ "epoch": 3.59,
1786
+ "grad_norm": 6.990530490875244,
1787
+ "learning_rate": 9.759637188208617e-05,
1788
+ "loss": 0.6278,
1789
+ "step": 2260
1790
+ },
1791
+ {
1792
+ "epoch": 3.6,
1793
+ "grad_norm": 3.6842026710510254,
1794
+ "learning_rate": 9.714285714285715e-05,
1795
+ "loss": 0.6444,
1796
+ "step": 2270
1797
+ },
1798
+ {
1799
+ "epoch": 3.62,
1800
+ "grad_norm": 3.766533851623535,
1801
+ "learning_rate": 9.668934240362812e-05,
1802
+ "loss": 0.6254,
1803
+ "step": 2280
1804
+ },
1805
+ {
1806
+ "epoch": 3.63,
1807
+ "grad_norm": 3.9097561836242676,
1808
+ "learning_rate": 9.62358276643991e-05,
1809
+ "loss": 0.6871,
1810
+ "step": 2290
1811
+ },
1812
+ {
1813
+ "epoch": 3.65,
1814
+ "grad_norm": 5.504273414611816,
1815
+ "learning_rate": 9.578231292517007e-05,
1816
+ "loss": 0.5783,
1817
+ "step": 2300
1818
+ },
1819
+ {
1820
+ "epoch": 3.65,
1821
+ "eval_accuracy": 0.7956349206349206,
1822
+ "eval_loss": 0.6727890968322754,
1823
+ "eval_runtime": 91.691,
1824
+ "eval_samples_per_second": 27.484,
1825
+ "eval_steps_per_second": 3.435,
1826
+ "step": 2300
1827
+ },
1828
+ {
1829
+ "epoch": 3.67,
1830
+ "grad_norm": 6.009402751922607,
1831
+ "learning_rate": 9.532879818594105e-05,
1832
+ "loss": 0.6296,
1833
+ "step": 2310
1834
+ },
1835
+ {
1836
+ "epoch": 3.68,
1837
+ "grad_norm": 3.723788022994995,
1838
+ "learning_rate": 9.487528344671203e-05,
1839
+ "loss": 0.61,
1840
+ "step": 2320
1841
+ },
1842
+ {
1843
+ "epoch": 3.7,
1844
+ "grad_norm": 1.6886146068572998,
1845
+ "learning_rate": 9.4421768707483e-05,
1846
+ "loss": 0.6042,
1847
+ "step": 2330
1848
+ },
1849
+ {
1850
+ "epoch": 3.71,
1851
+ "grad_norm": 3.2953665256500244,
1852
+ "learning_rate": 9.396825396825397e-05,
1853
+ "loss": 0.5417,
1854
+ "step": 2340
1855
+ },
1856
+ {
1857
+ "epoch": 3.73,
1858
+ "grad_norm": 5.710081100463867,
1859
+ "learning_rate": 9.351473922902494e-05,
1860
+ "loss": 0.4744,
1861
+ "step": 2350
1862
+ },
1863
+ {
1864
+ "epoch": 3.75,
1865
+ "grad_norm": 8.341416358947754,
1866
+ "learning_rate": 9.306122448979592e-05,
1867
+ "loss": 0.6468,
1868
+ "step": 2360
1869
+ },
1870
+ {
1871
+ "epoch": 3.76,
1872
+ "grad_norm": 5.668067455291748,
1873
+ "learning_rate": 9.26077097505669e-05,
1874
+ "loss": 0.7058,
1875
+ "step": 2370
1876
+ },
1877
+ {
1878
+ "epoch": 3.78,
1879
+ "grad_norm": 7.821765422821045,
1880
+ "learning_rate": 9.215419501133787e-05,
1881
+ "loss": 0.6014,
1882
+ "step": 2380
1883
+ },
1884
+ {
1885
+ "epoch": 3.79,
1886
+ "grad_norm": 3.075150728225708,
1887
+ "learning_rate": 9.170068027210885e-05,
1888
+ "loss": 0.5857,
1889
+ "step": 2390
1890
+ },
1891
+ {
1892
+ "epoch": 3.81,
1893
+ "grad_norm": 5.349681377410889,
1894
+ "learning_rate": 9.124716553287982e-05,
1895
+ "loss": 0.7202,
1896
+ "step": 2400
1897
+ },
1898
+ {
1899
+ "epoch": 3.81,
1900
+ "eval_accuracy": 0.8154761904761905,
1901
+ "eval_loss": 0.6007378697395325,
1902
+ "eval_runtime": 91.6882,
1903
+ "eval_samples_per_second": 27.484,
1904
+ "eval_steps_per_second": 3.436,
1905
+ "step": 2400
1906
+ },
1907
+ {
1908
+ "epoch": 3.83,
1909
+ "grad_norm": 6.9100117683410645,
1910
+ "learning_rate": 9.079365079365079e-05,
1911
+ "loss": 0.5335,
1912
+ "step": 2410
1913
+ },
1914
+ {
1915
+ "epoch": 3.84,
1916
+ "grad_norm": NaN,
1917
+ "learning_rate": 9.038548752834468e-05,
1918
+ "loss": 0.7311,
1919
+ "step": 2420
1920
+ },
1921
+ {
1922
+ "epoch": 3.86,
1923
+ "grad_norm": 3.347836494445801,
1924
+ "learning_rate": 8.993197278911565e-05,
1925
+ "loss": 0.5405,
1926
+ "step": 2430
1927
+ },
1928
+ {
1929
+ "epoch": 3.87,
1930
+ "grad_norm": 10.232234954833984,
1931
+ "learning_rate": 8.947845804988662e-05,
1932
+ "loss": 0.7061,
1933
+ "step": 2440
1934
+ },
1935
+ {
1936
+ "epoch": 3.89,
1937
+ "grad_norm": 5.486011981964111,
1938
+ "learning_rate": 8.902494331065761e-05,
1939
+ "loss": 0.6905,
1940
+ "step": 2450
1941
+ },
1942
+ {
1943
+ "epoch": 3.9,
1944
+ "grad_norm": 4.699936389923096,
1945
+ "learning_rate": 8.857142857142857e-05,
1946
+ "loss": 0.5979,
1947
+ "step": 2460
1948
+ },
1949
+ {
1950
+ "epoch": 3.92,
1951
+ "grad_norm": 7.7097296714782715,
1952
+ "learning_rate": 8.811791383219955e-05,
1953
+ "loss": 0.5904,
1954
+ "step": 2470
1955
+ },
1956
+ {
1957
+ "epoch": 3.94,
1958
+ "grad_norm": 3.3451244831085205,
1959
+ "learning_rate": 8.766439909297052e-05,
1960
+ "loss": 0.5359,
1961
+ "step": 2480
1962
+ },
1963
+ {
1964
+ "epoch": 3.95,
1965
+ "grad_norm": 7.01662540435791,
1966
+ "learning_rate": 8.72108843537415e-05,
1967
+ "loss": 0.728,
1968
+ "step": 2490
1969
+ },
1970
+ {
1971
+ "epoch": 3.97,
1972
+ "grad_norm": 4.137801170349121,
1973
+ "learning_rate": 8.675736961451247e-05,
1974
+ "loss": 0.5525,
1975
+ "step": 2500
1976
+ },
1977
+ {
1978
+ "epoch": 3.97,
1979
+ "eval_accuracy": 0.8055555555555556,
1980
+ "eval_loss": 0.6558998823165894,
1981
+ "eval_runtime": 94.8951,
1982
+ "eval_samples_per_second": 26.556,
1983
+ "eval_steps_per_second": 3.319,
1984
+ "step": 2500
1985
+ },
1986
+ {
1987
+ "epoch": 3.98,
1988
+ "grad_norm": 6.564231872558594,
1989
+ "learning_rate": 8.630385487528345e-05,
1990
+ "loss": 0.6735,
1991
+ "step": 2510
1992
+ },
1993
+ {
1994
+ "epoch": 4.0,
1995
+ "grad_norm": 5.252976894378662,
1996
+ "learning_rate": 8.585034013605443e-05,
1997
+ "loss": 0.6903,
1998
+ "step": 2520
1999
+ },
2000
+ {
2001
+ "epoch": 4.02,
2002
+ "grad_norm": 4.289461612701416,
2003
+ "learning_rate": 8.53968253968254e-05,
2004
+ "loss": 0.5104,
2005
+ "step": 2530
2006
+ },
2007
+ {
2008
+ "epoch": 4.03,
2009
+ "grad_norm": 4.131557941436768,
2010
+ "learning_rate": 8.494331065759637e-05,
2011
+ "loss": 0.5064,
2012
+ "step": 2540
2013
+ },
2014
+ {
2015
+ "epoch": 4.05,
2016
+ "grad_norm": 3.5607643127441406,
2017
+ "learning_rate": 8.448979591836736e-05,
2018
+ "loss": 0.4825,
2019
+ "step": 2550
2020
+ },
2021
+ {
2022
+ "epoch": 4.06,
2023
+ "grad_norm": 2.5051770210266113,
2024
+ "learning_rate": 8.403628117913832e-05,
2025
+ "loss": 0.5668,
2026
+ "step": 2560
2027
+ },
2028
+ {
2029
+ "epoch": 4.08,
2030
+ "grad_norm": 1.6936321258544922,
2031
+ "learning_rate": 8.35827664399093e-05,
2032
+ "loss": 0.4412,
2033
+ "step": 2570
2034
+ },
2035
+ {
2036
+ "epoch": 4.1,
2037
+ "grad_norm": 3.99070143699646,
2038
+ "learning_rate": 8.312925170068027e-05,
2039
+ "loss": 0.5046,
2040
+ "step": 2580
2041
+ },
2042
+ {
2043
+ "epoch": 4.11,
2044
+ "grad_norm": 4.099004745483398,
2045
+ "learning_rate": 8.267573696145126e-05,
2046
+ "loss": 0.5753,
2047
+ "step": 2590
2048
+ },
2049
+ {
2050
+ "epoch": 4.13,
2051
+ "grad_norm": 3.5789458751678467,
2052
+ "learning_rate": 8.222222222222222e-05,
2053
+ "loss": 0.519,
2054
+ "step": 2600
2055
+ },
2056
+ {
2057
+ "epoch": 4.13,
2058
+ "eval_accuracy": 0.8222222222222222,
2059
+ "eval_loss": 0.5868101716041565,
2060
+ "eval_runtime": 93.686,
2061
+ "eval_samples_per_second": 26.898,
2062
+ "eval_steps_per_second": 3.362,
2063
+ "step": 2600
2064
+ },
2065
+ {
2066
+ "epoch": 4.14,
2067
+ "grad_norm": 4.785660266876221,
2068
+ "learning_rate": 8.17687074829932e-05,
2069
+ "loss": 0.4747,
2070
+ "step": 2610
2071
+ },
2072
+ {
2073
+ "epoch": 4.16,
2074
+ "grad_norm": 3.662282705307007,
2075
+ "learning_rate": 8.131519274376418e-05,
2076
+ "loss": 0.4822,
2077
+ "step": 2620
2078
+ },
2079
+ {
2080
+ "epoch": 4.17,
2081
+ "grad_norm": 3.897036075592041,
2082
+ "learning_rate": 8.086167800453515e-05,
2083
+ "loss": 0.4848,
2084
+ "step": 2630
2085
+ },
2086
+ {
2087
+ "epoch": 4.19,
2088
+ "grad_norm": 5.990245342254639,
2089
+ "learning_rate": 8.040816326530612e-05,
2090
+ "loss": 0.5569,
2091
+ "step": 2640
2092
+ },
2093
+ {
2094
+ "epoch": 4.21,
2095
+ "grad_norm": 4.004576206207275,
2096
+ "learning_rate": 7.99546485260771e-05,
2097
+ "loss": 0.4929,
2098
+ "step": 2650
2099
+ },
2100
+ {
2101
+ "epoch": 4.22,
2102
+ "grad_norm": 5.0893754959106445,
2103
+ "learning_rate": 7.950113378684808e-05,
2104
+ "loss": 0.5564,
2105
+ "step": 2660
2106
+ },
2107
+ {
2108
+ "epoch": 4.24,
2109
+ "grad_norm": 3.9271037578582764,
2110
+ "learning_rate": 7.904761904761905e-05,
2111
+ "loss": 0.382,
2112
+ "step": 2670
2113
+ },
2114
+ {
2115
+ "epoch": 4.25,
2116
+ "grad_norm": 4.929361820220947,
2117
+ "learning_rate": 7.859410430839002e-05,
2118
+ "loss": 0.5502,
2119
+ "step": 2680
2120
+ },
2121
+ {
2122
+ "epoch": 4.27,
2123
+ "grad_norm": 4.333121299743652,
2124
+ "learning_rate": 7.814058956916101e-05,
2125
+ "loss": 0.5877,
2126
+ "step": 2690
2127
+ },
2128
+ {
2129
+ "epoch": 4.29,
2130
+ "grad_norm": 3.787369728088379,
2131
+ "learning_rate": 7.768707482993197e-05,
2132
+ "loss": 0.6171,
2133
+ "step": 2700
2134
+ },
2135
+ {
2136
+ "epoch": 4.29,
2137
+ "eval_accuracy": 0.8103174603174603,
2138
+ "eval_loss": 0.6157482266426086,
2139
+ "eval_runtime": 91.6071,
2140
+ "eval_samples_per_second": 27.509,
2141
+ "eval_steps_per_second": 3.439,
2142
+ "step": 2700
2143
+ },
2144
+ {
2145
+ "epoch": 4.3,
2146
+ "grad_norm": 4.089489936828613,
2147
+ "learning_rate": 7.723356009070295e-05,
2148
+ "loss": 0.5279,
2149
+ "step": 2710
2150
+ },
2151
+ {
2152
+ "epoch": 4.32,
2153
+ "grad_norm": 5.687109470367432,
2154
+ "learning_rate": 7.678004535147393e-05,
2155
+ "loss": 0.4443,
2156
+ "step": 2720
2157
+ },
2158
+ {
2159
+ "epoch": 4.33,
2160
+ "grad_norm": 3.6936216354370117,
2161
+ "learning_rate": 7.632653061224491e-05,
2162
+ "loss": 0.533,
2163
+ "step": 2730
2164
+ },
2165
+ {
2166
+ "epoch": 4.35,
2167
+ "grad_norm": 2.403999090194702,
2168
+ "learning_rate": 7.587301587301587e-05,
2169
+ "loss": 0.5214,
2170
+ "step": 2740
2171
+ },
2172
+ {
2173
+ "epoch": 4.37,
2174
+ "grad_norm": 5.553544044494629,
2175
+ "learning_rate": 7.541950113378686e-05,
2176
+ "loss": 0.5621,
2177
+ "step": 2750
2178
+ },
2179
+ {
2180
+ "epoch": 4.38,
2181
+ "grad_norm": 2.7590301036834717,
2182
+ "learning_rate": 7.496598639455783e-05,
2183
+ "loss": 0.4752,
2184
+ "step": 2760
2185
+ },
2186
+ {
2187
+ "epoch": 4.4,
2188
+ "grad_norm": 6.257105350494385,
2189
+ "learning_rate": 7.45124716553288e-05,
2190
+ "loss": 0.5196,
2191
+ "step": 2770
2192
+ },
2193
+ {
2194
+ "epoch": 4.41,
2195
+ "grad_norm": 6.366973400115967,
2196
+ "learning_rate": 7.405895691609977e-05,
2197
+ "loss": 0.4779,
2198
+ "step": 2780
2199
+ },
2200
+ {
2201
+ "epoch": 4.43,
2202
+ "grad_norm": 4.746866703033447,
2203
+ "learning_rate": 7.360544217687076e-05,
2204
+ "loss": 0.4031,
2205
+ "step": 2790
2206
+ },
2207
+ {
2208
+ "epoch": 4.44,
2209
+ "grad_norm": 3.278367519378662,
2210
+ "learning_rate": 7.315192743764173e-05,
2211
+ "loss": 0.5401,
2212
+ "step": 2800
2213
+ },
2214
+ {
2215
+ "epoch": 4.44,
2216
+ "eval_accuracy": 0.8083333333333333,
2217
+ "eval_loss": 0.6119987368583679,
2218
+ "eval_runtime": 91.505,
2219
+ "eval_samples_per_second": 27.539,
2220
+ "eval_steps_per_second": 3.442,
2221
+ "step": 2800
2222
+ },
2223
+ {
2224
+ "epoch": 4.46,
2225
+ "grad_norm": 5.5712504386901855,
2226
+ "learning_rate": 7.26984126984127e-05,
2227
+ "loss": 0.5565,
2228
+ "step": 2810
2229
+ },
2230
+ {
2231
+ "epoch": 4.48,
2232
+ "grad_norm": 4.8718767166137695,
2233
+ "learning_rate": 7.224489795918368e-05,
2234
+ "loss": 0.5219,
2235
+ "step": 2820
2236
+ },
2237
+ {
2238
+ "epoch": 4.49,
2239
+ "grad_norm": 5.685264587402344,
2240
+ "learning_rate": 7.179138321995466e-05,
2241
+ "loss": 0.5021,
2242
+ "step": 2830
2243
+ },
2244
+ {
2245
+ "epoch": 4.51,
2246
+ "grad_norm": 6.191750526428223,
2247
+ "learning_rate": 7.133786848072562e-05,
2248
+ "loss": 0.5115,
2249
+ "step": 2840
2250
+ },
2251
+ {
2252
+ "epoch": 4.52,
2253
+ "grad_norm": 5.250429630279541,
2254
+ "learning_rate": 7.08843537414966e-05,
2255
+ "loss": 0.5805,
2256
+ "step": 2850
2257
+ },
2258
+ {
2259
+ "epoch": 4.54,
2260
+ "grad_norm": 5.834912300109863,
2261
+ "learning_rate": 7.043083900226758e-05,
2262
+ "loss": 0.584,
2263
+ "step": 2860
2264
+ },
2265
+ {
2266
+ "epoch": 4.56,
2267
+ "grad_norm": 2.52066707611084,
2268
+ "learning_rate": 6.997732426303855e-05,
2269
+ "loss": 0.3752,
2270
+ "step": 2870
2271
+ },
2272
+ {
2273
+ "epoch": 4.57,
2274
+ "grad_norm": 2.4411842823028564,
2275
+ "learning_rate": 6.952380952380952e-05,
2276
+ "loss": 0.5748,
2277
+ "step": 2880
2278
+ },
2279
+ {
2280
+ "epoch": 4.59,
2281
+ "grad_norm": 4.865506649017334,
2282
+ "learning_rate": 6.907029478458051e-05,
2283
+ "loss": 0.6119,
2284
+ "step": 2890
2285
+ },
2286
+ {
2287
+ "epoch": 4.6,
2288
+ "grad_norm": 2.109516143798828,
2289
+ "learning_rate": 6.861678004535148e-05,
2290
+ "loss": 0.6105,
2291
+ "step": 2900
2292
+ },
2293
+ {
2294
+ "epoch": 4.6,
2295
+ "eval_accuracy": 0.8325396825396826,
2296
+ "eval_loss": 0.5618996024131775,
2297
+ "eval_runtime": 90.5734,
2298
+ "eval_samples_per_second": 27.823,
2299
+ "eval_steps_per_second": 3.478,
2300
+ "step": 2900
2301
+ },
2302
+ {
2303
+ "epoch": 4.62,
2304
+ "grad_norm": 6.931108474731445,
2305
+ "learning_rate": 6.816326530612245e-05,
2306
+ "loss": 0.6452,
2307
+ "step": 2910
2308
+ },
2309
+ {
2310
+ "epoch": 4.63,
2311
+ "grad_norm": 5.011793613433838,
2312
+ "learning_rate": 6.770975056689343e-05,
2313
+ "loss": 0.3178,
2314
+ "step": 2920
2315
+ },
2316
+ {
2317
+ "epoch": 4.65,
2318
+ "grad_norm": 2.996354818344116,
2319
+ "learning_rate": 6.72562358276644e-05,
2320
+ "loss": 0.3436,
2321
+ "step": 2930
2322
+ },
2323
+ {
2324
+ "epoch": 4.67,
2325
+ "grad_norm": 4.3052754402160645,
2326
+ "learning_rate": 6.680272108843538e-05,
2327
+ "loss": 0.4743,
2328
+ "step": 2940
2329
+ },
2330
+ {
2331
+ "epoch": 4.68,
2332
+ "grad_norm": 5.777819633483887,
2333
+ "learning_rate": 6.634920634920636e-05,
2334
+ "loss": 0.6294,
2335
+ "step": 2950
2336
+ },
2337
+ {
2338
+ "epoch": 4.7,
2339
+ "grad_norm": 4.6029157638549805,
2340
+ "learning_rate": 6.589569160997733e-05,
2341
+ "loss": 0.495,
2342
+ "step": 2960
2343
+ },
2344
+ {
2345
+ "epoch": 4.71,
2346
+ "grad_norm": 3.34942889213562,
2347
+ "learning_rate": 6.54421768707483e-05,
2348
+ "loss": 0.5216,
2349
+ "step": 2970
2350
+ },
2351
+ {
2352
+ "epoch": 4.73,
2353
+ "grad_norm": 3.747777223587036,
2354
+ "learning_rate": 6.498866213151927e-05,
2355
+ "loss": 0.4908,
2356
+ "step": 2980
2357
+ },
2358
+ {
2359
+ "epoch": 4.75,
2360
+ "grad_norm": 3.217052459716797,
2361
+ "learning_rate": 6.453514739229024e-05,
2362
+ "loss": 0.5847,
2363
+ "step": 2990
2364
+ },
2365
+ {
2366
+ "epoch": 4.76,
2367
+ "grad_norm": 2.1214661598205566,
2368
+ "learning_rate": 6.408163265306123e-05,
2369
+ "loss": 0.7497,
2370
+ "step": 3000
2371
+ },
2372
+ {
2373
+ "epoch": 4.76,
2374
+ "eval_accuracy": 0.8301587301587302,
2375
+ "eval_loss": 0.5859270095825195,
2376
+ "eval_runtime": 89.9674,
2377
+ "eval_samples_per_second": 28.01,
2378
+ "eval_steps_per_second": 3.501,
2379
+ "step": 3000
2380
+ },
2381
+ {
2382
+ "epoch": 4.78,
2383
+ "grad_norm": 3.495818614959717,
2384
+ "learning_rate": 6.36281179138322e-05,
2385
+ "loss": 0.4944,
2386
+ "step": 3010
2387
+ },
2388
+ {
2389
+ "epoch": 4.79,
2390
+ "grad_norm": 6.128993034362793,
2391
+ "learning_rate": 6.317460317460318e-05,
2392
+ "loss": 0.5474,
2393
+ "step": 3020
2394
+ },
2395
+ {
2396
+ "epoch": 4.81,
2397
+ "grad_norm": 2.1895864009857178,
2398
+ "learning_rate": 6.272108843537415e-05,
2399
+ "loss": 0.5447,
2400
+ "step": 3030
2401
+ },
2402
+ {
2403
+ "epoch": 4.83,
2404
+ "grad_norm": 4.312340259552002,
2405
+ "learning_rate": 6.226757369614513e-05,
2406
+ "loss": 0.5194,
2407
+ "step": 3040
2408
+ },
2409
+ {
2410
+ "epoch": 4.84,
2411
+ "grad_norm": 5.473392009735107,
2412
+ "learning_rate": 6.181405895691609e-05,
2413
+ "loss": 0.5538,
2414
+ "step": 3050
2415
+ },
2416
+ {
2417
+ "epoch": 4.86,
2418
+ "grad_norm": 2.302345037460327,
2419
+ "learning_rate": 6.136054421768708e-05,
2420
+ "loss": 0.4733,
2421
+ "step": 3060
2422
+ },
2423
+ {
2424
+ "epoch": 4.87,
2425
+ "grad_norm": 3.3499696254730225,
2426
+ "learning_rate": 6.090702947845806e-05,
2427
+ "loss": 0.755,
2428
+ "step": 3070
2429
+ },
2430
+ {
2431
+ "epoch": 4.89,
2432
+ "grad_norm": 3.382891893386841,
2433
+ "learning_rate": 6.045351473922902e-05,
2434
+ "loss": 0.4829,
2435
+ "step": 3080
2436
+ },
2437
+ {
2438
+ "epoch": 4.9,
2439
+ "grad_norm": 3.4998888969421387,
2440
+ "learning_rate": 6e-05,
2441
+ "loss": 0.4543,
2442
+ "step": 3090
2443
+ },
2444
+ {
2445
+ "epoch": 4.92,
2446
+ "grad_norm": 3.2224600315093994,
2447
+ "learning_rate": 5.954648526077098e-05,
2448
+ "loss": 0.4856,
2449
+ "step": 3100
2450
+ },
2451
+ {
2452
+ "epoch": 4.92,
2453
+ "eval_accuracy": 0.8261904761904761,
2454
+ "eval_loss": 0.5833402872085571,
2455
+ "eval_runtime": 90.9659,
2456
+ "eval_samples_per_second": 27.703,
2457
+ "eval_steps_per_second": 3.463,
2458
+ "step": 3100
2459
+ },
2460
+ {
2461
+ "epoch": 4.94,
2462
+ "grad_norm": 2.1335177421569824,
2463
+ "learning_rate": 5.909297052154196e-05,
2464
+ "loss": 0.3995,
2465
+ "step": 3110
2466
+ },
2467
+ {
2468
+ "epoch": 4.95,
2469
+ "grad_norm": 4.156594753265381,
2470
+ "learning_rate": 5.8639455782312925e-05,
2471
+ "loss": 0.5539,
2472
+ "step": 3120
2473
+ },
2474
+ {
2475
+ "epoch": 4.97,
2476
+ "grad_norm": 5.811089515686035,
2477
+ "learning_rate": 5.8185941043083904e-05,
2478
+ "loss": 0.579,
2479
+ "step": 3130
2480
+ },
2481
+ {
2482
+ "epoch": 4.98,
2483
+ "grad_norm": 1.908595323562622,
2484
+ "learning_rate": 5.773242630385488e-05,
2485
+ "loss": 0.5315,
2486
+ "step": 3140
2487
+ },
2488
+ {
2489
+ "epoch": 5.0,
2490
+ "grad_norm": 2.8674533367156982,
2491
+ "learning_rate": 5.727891156462585e-05,
2492
+ "loss": 0.5577,
2493
+ "step": 3150
2494
+ },
2495
+ {
2496
+ "epoch": 5.02,
2497
+ "grad_norm": 1.5174939632415771,
2498
+ "learning_rate": 5.682539682539683e-05,
2499
+ "loss": 0.5052,
2500
+ "step": 3160
2501
+ },
2502
+ {
2503
+ "epoch": 5.03,
2504
+ "grad_norm": 3.2664871215820312,
2505
+ "learning_rate": 5.637188208616781e-05,
2506
+ "loss": 0.3135,
2507
+ "step": 3170
2508
+ },
2509
+ {
2510
+ "epoch": 5.05,
2511
+ "grad_norm": 5.35206937789917,
2512
+ "learning_rate": 5.5918367346938786e-05,
2513
+ "loss": 0.5859,
2514
+ "step": 3180
2515
+ },
2516
+ {
2517
+ "epoch": 5.06,
2518
+ "grad_norm": 2.6170833110809326,
2519
+ "learning_rate": 5.546485260770975e-05,
2520
+ "loss": 0.4511,
2521
+ "step": 3190
2522
+ },
2523
+ {
2524
+ "epoch": 5.08,
2525
+ "grad_norm": 3.2739861011505127,
2526
+ "learning_rate": 5.501133786848073e-05,
2527
+ "loss": 0.4959,
2528
+ "step": 3200
2529
+ },
2530
+ {
2531
+ "epoch": 5.08,
2532
+ "eval_accuracy": 0.832936507936508,
2533
+ "eval_loss": 0.5703846216201782,
2534
+ "eval_runtime": 92.0072,
2535
+ "eval_samples_per_second": 27.389,
2536
+ "eval_steps_per_second": 3.424,
2537
+ "step": 3200
2538
+ },
2539
+ {
2540
+ "epoch": 5.1,
2541
+ "grad_norm": 0.7258143424987793,
2542
+ "learning_rate": 5.455782312925171e-05,
2543
+ "loss": 0.4922,
2544
+ "step": 3210
2545
+ },
2546
+ {
2547
+ "epoch": 5.11,
2548
+ "grad_norm": 1.8463611602783203,
2549
+ "learning_rate": 5.4104308390022675e-05,
2550
+ "loss": 0.3888,
2551
+ "step": 3220
2552
+ },
2553
+ {
2554
+ "epoch": 5.13,
2555
+ "grad_norm": 4.082030296325684,
2556
+ "learning_rate": 5.3650793650793654e-05,
2557
+ "loss": 0.4486,
2558
+ "step": 3230
2559
+ },
2560
+ {
2561
+ "epoch": 5.14,
2562
+ "grad_norm": 1.123199462890625,
2563
+ "learning_rate": 5.319727891156463e-05,
2564
+ "loss": 0.4229,
2565
+ "step": 3240
2566
+ },
2567
+ {
2568
+ "epoch": 5.16,
2569
+ "grad_norm": 2.546860694885254,
2570
+ "learning_rate": 5.2743764172335605e-05,
2571
+ "loss": 0.5514,
2572
+ "step": 3250
2573
+ },
2574
+ {
2575
+ "epoch": 5.17,
2576
+ "grad_norm": 6.097323417663574,
2577
+ "learning_rate": 5.229024943310658e-05,
2578
+ "loss": 0.4391,
2579
+ "step": 3260
2580
+ },
2581
+ {
2582
+ "epoch": 5.19,
2583
+ "grad_norm": 3.1347315311431885,
2584
+ "learning_rate": 5.1836734693877557e-05,
2585
+ "loss": 0.4455,
2586
+ "step": 3270
2587
+ },
2588
+ {
2589
+ "epoch": 5.21,
2590
+ "grad_norm": 2.0431602001190186,
2591
+ "learning_rate": 5.138321995464853e-05,
2592
+ "loss": 0.3076,
2593
+ "step": 3280
2594
+ },
2595
+ {
2596
+ "epoch": 5.22,
2597
+ "grad_norm": 4.452468395233154,
2598
+ "learning_rate": 5.09297052154195e-05,
2599
+ "loss": 0.4442,
2600
+ "step": 3290
2601
+ },
2602
+ {
2603
+ "epoch": 5.24,
2604
+ "grad_norm": 5.282511234283447,
2605
+ "learning_rate": 5.047619047619048e-05,
2606
+ "loss": 0.4413,
2607
+ "step": 3300
2608
+ },
2609
+ {
2610
+ "epoch": 5.24,
2611
+ "eval_accuracy": 0.819047619047619,
2612
+ "eval_loss": 0.6217456459999084,
2613
+ "eval_runtime": 91.9773,
2614
+ "eval_samples_per_second": 27.398,
2615
+ "eval_steps_per_second": 3.425,
2616
+ "step": 3300
2617
+ },
2618
+ {
2619
+ "epoch": 5.25,
2620
+ "grad_norm": 8.806018829345703,
2621
+ "learning_rate": 5.002267573696145e-05,
2622
+ "loss": 0.4263,
2623
+ "step": 3310
2624
+ },
2625
+ {
2626
+ "epoch": 5.27,
2627
+ "grad_norm": 4.782093048095703,
2628
+ "learning_rate": 4.9569160997732425e-05,
2629
+ "loss": 0.6431,
2630
+ "step": 3320
2631
+ },
2632
+ {
2633
+ "epoch": 5.29,
2634
+ "grad_norm": 5.115506172180176,
2635
+ "learning_rate": 4.9115646258503404e-05,
2636
+ "loss": 0.4876,
2637
+ "step": 3330
2638
+ },
2639
+ {
2640
+ "epoch": 5.3,
2641
+ "grad_norm": 4.416604995727539,
2642
+ "learning_rate": 4.8662131519274376e-05,
2643
+ "loss": 0.371,
2644
+ "step": 3340
2645
+ },
2646
+ {
2647
+ "epoch": 5.32,
2648
+ "grad_norm": 3.7609243392944336,
2649
+ "learning_rate": 4.820861678004535e-05,
2650
+ "loss": 0.3873,
2651
+ "step": 3350
2652
+ },
2653
+ {
2654
+ "epoch": 5.33,
2655
+ "grad_norm": 4.652336120605469,
2656
+ "learning_rate": 4.775510204081633e-05,
2657
+ "loss": 0.4134,
2658
+ "step": 3360
2659
+ },
2660
+ {
2661
+ "epoch": 5.35,
2662
+ "grad_norm": 5.700862407684326,
2663
+ "learning_rate": 4.73015873015873e-05,
2664
+ "loss": 0.4463,
2665
+ "step": 3370
2666
+ },
2667
+ {
2668
+ "epoch": 5.37,
2669
+ "grad_norm": 4.792759895324707,
2670
+ "learning_rate": 4.684807256235828e-05,
2671
+ "loss": 0.4243,
2672
+ "step": 3380
2673
+ },
2674
+ {
2675
+ "epoch": 5.38,
2676
+ "grad_norm": 4.60031795501709,
2677
+ "learning_rate": 4.639455782312925e-05,
2678
+ "loss": 0.3602,
2679
+ "step": 3390
2680
+ },
2681
+ {
2682
+ "epoch": 5.4,
2683
+ "grad_norm": 6.657435417175293,
2684
+ "learning_rate": 4.594104308390023e-05,
2685
+ "loss": 0.4513,
2686
+ "step": 3400
2687
+ },
2688
+ {
2689
+ "epoch": 5.4,
2690
+ "eval_accuracy": 0.8293650793650794,
2691
+ "eval_loss": 0.5750200748443604,
2692
+ "eval_runtime": 96.412,
2693
+ "eval_samples_per_second": 26.138,
2694
+ "eval_steps_per_second": 3.267,
2695
+ "step": 3400
2696
+ },
2697
+ {
2698
+ "epoch": 5.41,
2699
+ "grad_norm": 3.526585578918457,
2700
+ "learning_rate": 4.54875283446712e-05,
2701
+ "loss": 0.374,
2702
+ "step": 3410
2703
+ },
2704
+ {
2705
+ "epoch": 5.43,
2706
+ "grad_norm": 2.742906332015991,
2707
+ "learning_rate": 4.5034013605442174e-05,
2708
+ "loss": 0.3822,
2709
+ "step": 3420
2710
+ },
2711
+ {
2712
+ "epoch": 5.44,
2713
+ "grad_norm": 3.6437864303588867,
2714
+ "learning_rate": 4.4580498866213154e-05,
2715
+ "loss": 0.4369,
2716
+ "step": 3430
2717
+ },
2718
+ {
2719
+ "epoch": 5.46,
2720
+ "grad_norm": 2.621497869491577,
2721
+ "learning_rate": 4.4126984126984126e-05,
2722
+ "loss": 0.3399,
2723
+ "step": 3440
2724
+ },
2725
+ {
2726
+ "epoch": 5.48,
2727
+ "grad_norm": 1.8672789335250854,
2728
+ "learning_rate": 4.3673469387755105e-05,
2729
+ "loss": 0.584,
2730
+ "step": 3450
2731
+ },
2732
+ {
2733
+ "epoch": 5.49,
2734
+ "grad_norm": 1.4387513399124146,
2735
+ "learning_rate": 4.321995464852608e-05,
2736
+ "loss": 0.4242,
2737
+ "step": 3460
2738
+ },
2739
+ {
2740
+ "epoch": 5.51,
2741
+ "grad_norm": 0.7556703090667725,
2742
+ "learning_rate": 4.2766439909297056e-05,
2743
+ "loss": 0.462,
2744
+ "step": 3470
2745
+ },
2746
+ {
2747
+ "epoch": 5.52,
2748
+ "grad_norm": 2.0518500804901123,
2749
+ "learning_rate": 4.231292517006803e-05,
2750
+ "loss": 0.4239,
2751
+ "step": 3480
2752
+ },
2753
+ {
2754
+ "epoch": 5.54,
2755
+ "grad_norm": 4.876095771789551,
2756
+ "learning_rate": 4.1859410430839e-05,
2757
+ "loss": 0.5488,
2758
+ "step": 3490
2759
+ },
2760
+ {
2761
+ "epoch": 5.56,
2762
+ "grad_norm": 1.6563076972961426,
2763
+ "learning_rate": 4.140589569160998e-05,
2764
+ "loss": 0.3987,
2765
+ "step": 3500
2766
+ },
2767
+ {
2768
+ "epoch": 5.56,
2769
+ "eval_accuracy": 0.8341269841269842,
2770
+ "eval_loss": 0.5825861096382141,
2771
+ "eval_runtime": 98.7535,
2772
+ "eval_samples_per_second": 25.518,
2773
+ "eval_steps_per_second": 3.19,
2774
+ "step": 3500
2775
+ },
2776
+ {
2777
+ "epoch": 5.57,
2778
+ "grad_norm": 6.538485050201416,
2779
+ "learning_rate": 4.095238095238095e-05,
2780
+ "loss": 0.5583,
2781
+ "step": 3510
2782
+ },
2783
+ {
2784
+ "epoch": 5.59,
2785
+ "grad_norm": 4.381786823272705,
2786
+ "learning_rate": 4.049886621315193e-05,
2787
+ "loss": 0.408,
2788
+ "step": 3520
2789
+ },
2790
+ {
2791
+ "epoch": 5.6,
2792
+ "grad_norm": 3.696018695831299,
2793
+ "learning_rate": 4.00453514739229e-05,
2794
+ "loss": 0.4421,
2795
+ "step": 3530
2796
+ },
2797
+ {
2798
+ "epoch": 5.62,
2799
+ "grad_norm": 3.3752057552337646,
2800
+ "learning_rate": 3.9591836734693876e-05,
2801
+ "loss": 0.4057,
2802
+ "step": 3540
2803
+ },
2804
+ {
2805
+ "epoch": 5.63,
2806
+ "grad_norm": 6.889618396759033,
2807
+ "learning_rate": 3.9138321995464855e-05,
2808
+ "loss": 0.4479,
2809
+ "step": 3550
2810
+ },
2811
+ {
2812
+ "epoch": 5.65,
2813
+ "grad_norm": 5.5090131759643555,
2814
+ "learning_rate": 3.868480725623583e-05,
2815
+ "loss": 0.6009,
2816
+ "step": 3560
2817
+ },
2818
+ {
2819
+ "epoch": 5.67,
2820
+ "grad_norm": 2.368633985519409,
2821
+ "learning_rate": 3.8231292517006806e-05,
2822
+ "loss": 0.4276,
2823
+ "step": 3570
2824
+ },
2825
+ {
2826
+ "epoch": 5.68,
2827
+ "grad_norm": 4.297762393951416,
2828
+ "learning_rate": 3.777777777777778e-05,
2829
+ "loss": 0.4368,
2830
+ "step": 3580
2831
+ },
2832
+ {
2833
+ "epoch": 5.7,
2834
+ "grad_norm": 3.7877378463745117,
2835
+ "learning_rate": 3.732426303854876e-05,
2836
+ "loss": 0.3481,
2837
+ "step": 3590
2838
+ },
2839
+ {
2840
+ "epoch": 5.71,
2841
+ "grad_norm": 2.9667625427246094,
2842
+ "learning_rate": 3.687074829931973e-05,
2843
+ "loss": 0.4395,
2844
+ "step": 3600
2845
+ },
2846
+ {
2847
+ "epoch": 5.71,
2848
+ "eval_accuracy": 0.8384920634920635,
2849
+ "eval_loss": 0.5753714442253113,
2850
+ "eval_runtime": 97.0735,
2851
+ "eval_samples_per_second": 25.96,
2852
+ "eval_steps_per_second": 3.245,
2853
+ "step": 3600
2854
+ },
2855
+ {
2856
+ "epoch": 5.73,
2857
+ "grad_norm": 5.830836296081543,
2858
+ "learning_rate": 3.64172335600907e-05,
2859
+ "loss": 0.4859,
2860
+ "step": 3610
2861
+ },
2862
+ {
2863
+ "epoch": 5.75,
2864
+ "grad_norm": 5.262144088745117,
2865
+ "learning_rate": 3.596371882086168e-05,
2866
+ "loss": 0.4409,
2867
+ "step": 3620
2868
+ },
2869
+ {
2870
+ "epoch": 5.76,
2871
+ "grad_norm": 0.9806166291236877,
2872
+ "learning_rate": 3.551020408163265e-05,
2873
+ "loss": 0.3884,
2874
+ "step": 3630
2875
+ },
2876
+ {
2877
+ "epoch": 5.78,
2878
+ "grad_norm": 3.1864867210388184,
2879
+ "learning_rate": 3.505668934240363e-05,
2880
+ "loss": 0.3753,
2881
+ "step": 3640
2882
+ },
2883
+ {
2884
+ "epoch": 5.79,
2885
+ "grad_norm": 2.679213762283325,
2886
+ "learning_rate": 3.4603174603174604e-05,
2887
+ "loss": 0.3764,
2888
+ "step": 3650
2889
+ },
2890
+ {
2891
+ "epoch": 5.81,
2892
+ "grad_norm": 1.3040461540222168,
2893
+ "learning_rate": 3.4149659863945583e-05,
2894
+ "loss": 0.3632,
2895
+ "step": 3660
2896
+ },
2897
+ {
2898
+ "epoch": 5.83,
2899
+ "grad_norm": 6.803734302520752,
2900
+ "learning_rate": 3.3696145124716556e-05,
2901
+ "loss": 0.6695,
2902
+ "step": 3670
2903
+ },
2904
+ {
2905
+ "epoch": 5.84,
2906
+ "grad_norm": 3.717195510864258,
2907
+ "learning_rate": 3.324263038548753e-05,
2908
+ "loss": 0.4226,
2909
+ "step": 3680
2910
+ },
2911
+ {
2912
+ "epoch": 5.86,
2913
+ "grad_norm": 0.3749280273914337,
2914
+ "learning_rate": 3.278911564625851e-05,
2915
+ "loss": 0.4404,
2916
+ "step": 3690
2917
+ },
2918
+ {
2919
+ "epoch": 5.87,
2920
+ "grad_norm": 3.4548332691192627,
2921
+ "learning_rate": 3.233560090702948e-05,
2922
+ "loss": 0.4669,
2923
+ "step": 3700
2924
+ },
2925
+ {
2926
+ "epoch": 5.87,
2927
+ "eval_accuracy": 0.8357142857142857,
2928
+ "eval_loss": 0.5652737021446228,
2929
+ "eval_runtime": 98.0382,
2930
+ "eval_samples_per_second": 25.704,
2931
+ "eval_steps_per_second": 3.213,
2932
+ "step": 3700
2933
+ },
2934
+ {
2935
+ "epoch": 5.89,
2936
+ "grad_norm": 2.5312252044677734,
2937
+ "learning_rate": 3.188208616780046e-05,
2938
+ "loss": 0.4786,
2939
+ "step": 3710
2940
+ },
2941
+ {
2942
+ "epoch": 5.9,
2943
+ "grad_norm": 4.841963291168213,
2944
+ "learning_rate": 3.142857142857143e-05,
2945
+ "loss": 0.5179,
2946
+ "step": 3720
2947
+ },
2948
+ {
2949
+ "epoch": 5.92,
2950
+ "grad_norm": 4.394898414611816,
2951
+ "learning_rate": 3.097505668934241e-05,
2952
+ "loss": 0.5062,
2953
+ "step": 3730
2954
+ },
2955
+ {
2956
+ "epoch": 5.94,
2957
+ "grad_norm": 4.590970516204834,
2958
+ "learning_rate": 3.052154195011338e-05,
2959
+ "loss": 0.3639,
2960
+ "step": 3740
2961
+ },
2962
+ {
2963
+ "epoch": 5.95,
2964
+ "grad_norm": 4.6838459968566895,
2965
+ "learning_rate": 3.0068027210884354e-05,
2966
+ "loss": 0.3552,
2967
+ "step": 3750
2968
+ },
2969
+ {
2970
+ "epoch": 5.97,
2971
+ "grad_norm": 4.303285121917725,
2972
+ "learning_rate": 2.961451247165533e-05,
2973
+ "loss": 0.3526,
2974
+ "step": 3760
2975
+ },
2976
+ {
2977
+ "epoch": 5.98,
2978
+ "grad_norm": 6.06368350982666,
2979
+ "learning_rate": 2.9160997732426306e-05,
2980
+ "loss": 0.3266,
2981
+ "step": 3770
2982
+ },
2983
+ {
2984
+ "epoch": 6.0,
2985
+ "grad_norm": 4.715054988861084,
2986
+ "learning_rate": 2.870748299319728e-05,
2987
+ "loss": 0.4612,
2988
+ "step": 3780
2989
+ },
2990
+ {
2991
+ "epoch": 6.02,
2992
+ "grad_norm": 3.029750108718872,
2993
+ "learning_rate": 2.8253968253968253e-05,
2994
+ "loss": 0.3938,
2995
+ "step": 3790
2996
+ },
2997
+ {
2998
+ "epoch": 6.03,
2999
+ "grad_norm": 6.53373384475708,
3000
+ "learning_rate": 2.7800453514739233e-05,
3001
+ "loss": 0.4005,
3002
+ "step": 3800
3003
+ },
3004
+ {
3005
+ "epoch": 6.03,
3006
+ "eval_accuracy": 0.8376984126984127,
3007
+ "eval_loss": 0.542424201965332,
3008
+ "eval_runtime": 92.2864,
3009
+ "eval_samples_per_second": 27.306,
3010
+ "eval_steps_per_second": 3.413,
3011
+ "step": 3800
3012
+ },
3013
+ {
3014
+ "epoch": 6.05,
3015
+ "grad_norm": 2.6752915382385254,
3016
+ "learning_rate": 2.7346938775510205e-05,
3017
+ "loss": 0.311,
3018
+ "step": 3810
3019
+ },
3020
+ {
3021
+ "epoch": 6.06,
3022
+ "grad_norm": 5.5302863121032715,
3023
+ "learning_rate": 2.6893424036281177e-05,
3024
+ "loss": 0.4392,
3025
+ "step": 3820
3026
+ },
3027
+ {
3028
+ "epoch": 6.08,
3029
+ "grad_norm": 3.140334367752075,
3030
+ "learning_rate": 2.6439909297052156e-05,
3031
+ "loss": 0.3301,
3032
+ "step": 3830
3033
+ },
3034
+ {
3035
+ "epoch": 6.1,
3036
+ "grad_norm": 7.531820774078369,
3037
+ "learning_rate": 2.598639455782313e-05,
3038
+ "loss": 0.4208,
3039
+ "step": 3840
3040
+ },
3041
+ {
3042
+ "epoch": 6.11,
3043
+ "grad_norm": 4.852546215057373,
3044
+ "learning_rate": 2.5532879818594107e-05,
3045
+ "loss": 0.3858,
3046
+ "step": 3850
3047
+ },
3048
+ {
3049
+ "epoch": 6.13,
3050
+ "grad_norm": 6.866227149963379,
3051
+ "learning_rate": 2.507936507936508e-05,
3052
+ "loss": 0.4882,
3053
+ "step": 3860
3054
+ },
3055
+ {
3056
+ "epoch": 6.14,
3057
+ "grad_norm": 2.053617477416992,
3058
+ "learning_rate": 2.4625850340136055e-05,
3059
+ "loss": 0.3131,
3060
+ "step": 3870
3061
+ },
3062
+ {
3063
+ "epoch": 6.16,
3064
+ "grad_norm": 3.9806272983551025,
3065
+ "learning_rate": 2.417233560090703e-05,
3066
+ "loss": 0.4078,
3067
+ "step": 3880
3068
+ },
3069
+ {
3070
+ "epoch": 6.17,
3071
+ "grad_norm": 3.6454029083251953,
3072
+ "learning_rate": 2.3718820861678007e-05,
3073
+ "loss": 0.4362,
3074
+ "step": 3890
3075
+ },
3076
+ {
3077
+ "epoch": 6.19,
3078
+ "grad_norm": 7.963522434234619,
3079
+ "learning_rate": 2.326530612244898e-05,
3080
+ "loss": 0.4457,
3081
+ "step": 3900
3082
+ },
3083
+ {
3084
+ "epoch": 6.19,
3085
+ "eval_accuracy": 0.8392857142857143,
3086
+ "eval_loss": 0.5619771480560303,
3087
+ "eval_runtime": 91.8917,
3088
+ "eval_samples_per_second": 27.424,
3089
+ "eval_steps_per_second": 3.428,
3090
+ "step": 3900
3091
+ },
3092
+ {
3093
+ "epoch": 6.21,
3094
+ "grad_norm": 0.9147405624389648,
3095
+ "learning_rate": 2.2811791383219955e-05,
3096
+ "loss": 0.278,
3097
+ "step": 3910
3098
+ },
3099
+ {
3100
+ "epoch": 6.22,
3101
+ "grad_norm": 7.86151647567749,
3102
+ "learning_rate": 2.235827664399093e-05,
3103
+ "loss": 0.3866,
3104
+ "step": 3920
3105
+ },
3106
+ {
3107
+ "epoch": 6.24,
3108
+ "grad_norm": 5.7085371017456055,
3109
+ "learning_rate": 2.1904761904761906e-05,
3110
+ "loss": 0.3633,
3111
+ "step": 3930
3112
+ },
3113
+ {
3114
+ "epoch": 6.25,
3115
+ "grad_norm": 4.3009772300720215,
3116
+ "learning_rate": 2.145124716553288e-05,
3117
+ "loss": 0.3932,
3118
+ "step": 3940
3119
+ },
3120
+ {
3121
+ "epoch": 6.27,
3122
+ "grad_norm": 0.3253590166568756,
3123
+ "learning_rate": 2.0997732426303857e-05,
3124
+ "loss": 0.3084,
3125
+ "step": 3950
3126
+ },
3127
+ {
3128
+ "epoch": 6.29,
3129
+ "grad_norm": 4.880115509033203,
3130
+ "learning_rate": 2.0544217687074833e-05,
3131
+ "loss": 0.3723,
3132
+ "step": 3960
3133
+ },
3134
+ {
3135
+ "epoch": 6.3,
3136
+ "grad_norm": 5.515774250030518,
3137
+ "learning_rate": 2.0090702947845805e-05,
3138
+ "loss": 0.4312,
3139
+ "step": 3970
3140
+ },
3141
+ {
3142
+ "epoch": 6.32,
3143
+ "grad_norm": 4.72555685043335,
3144
+ "learning_rate": 1.963718820861678e-05,
3145
+ "loss": 0.2305,
3146
+ "step": 3980
3147
+ },
3148
+ {
3149
+ "epoch": 6.33,
3150
+ "grad_norm": 0.13027626276016235,
3151
+ "learning_rate": 1.9183673469387756e-05,
3152
+ "loss": 0.3037,
3153
+ "step": 3990
3154
+ },
3155
+ {
3156
+ "epoch": 6.35,
3157
+ "grad_norm": 4.166107654571533,
3158
+ "learning_rate": 1.8730158730158732e-05,
3159
+ "loss": 0.3693,
3160
+ "step": 4000
3161
+ },
3162
+ {
3163
+ "epoch": 6.35,
3164
+ "eval_accuracy": 0.8412698412698413,
3165
+ "eval_loss": 0.5411426424980164,
3166
+ "eval_runtime": 91.5694,
3167
+ "eval_samples_per_second": 27.52,
3168
+ "eval_steps_per_second": 3.44,
3169
+ "step": 4000
3170
+ },
3171
+ {
3172
+ "epoch": 6.37,
3173
+ "grad_norm": 3.1201322078704834,
3174
+ "learning_rate": 1.8276643990929708e-05,
3175
+ "loss": 0.3472,
3176
+ "step": 4010
3177
+ },
3178
+ {
3179
+ "epoch": 6.38,
3180
+ "grad_norm": 5.773028373718262,
3181
+ "learning_rate": 1.7823129251700683e-05,
3182
+ "loss": 0.376,
3183
+ "step": 4020
3184
+ },
3185
+ {
3186
+ "epoch": 6.4,
3187
+ "grad_norm": 4.617589473724365,
3188
+ "learning_rate": 1.736961451247166e-05,
3189
+ "loss": 0.4044,
3190
+ "step": 4030
3191
+ },
3192
+ {
3193
+ "epoch": 6.41,
3194
+ "grad_norm": 2.2319741249084473,
3195
+ "learning_rate": 1.691609977324263e-05,
3196
+ "loss": 0.2603,
3197
+ "step": 4040
3198
+ },
3199
+ {
3200
+ "epoch": 6.43,
3201
+ "grad_norm": 4.363328456878662,
3202
+ "learning_rate": 1.6462585034013607e-05,
3203
+ "loss": 0.3736,
3204
+ "step": 4050
3205
+ },
3206
+ {
3207
+ "epoch": 6.44,
3208
+ "grad_norm": 1.4275892972946167,
3209
+ "learning_rate": 1.6009070294784583e-05,
3210
+ "loss": 0.2221,
3211
+ "step": 4060
3212
+ },
3213
+ {
3214
+ "epoch": 6.46,
3215
+ "grad_norm": 1.1315315961837769,
3216
+ "learning_rate": 1.5555555555555555e-05,
3217
+ "loss": 0.2594,
3218
+ "step": 4070
3219
+ },
3220
+ {
3221
+ "epoch": 6.48,
3222
+ "grad_norm": 2.161555051803589,
3223
+ "learning_rate": 1.5102040816326532e-05,
3224
+ "loss": 0.3224,
3225
+ "step": 4080
3226
+ },
3227
+ {
3228
+ "epoch": 6.49,
3229
+ "grad_norm": 5.551158428192139,
3230
+ "learning_rate": 1.4648526077097508e-05,
3231
+ "loss": 0.2861,
3232
+ "step": 4090
3233
+ },
3234
+ {
3235
+ "epoch": 6.51,
3236
+ "grad_norm": 4.5997419357299805,
3237
+ "learning_rate": 1.419501133786848e-05,
3238
+ "loss": 0.2933,
3239
+ "step": 4100
3240
+ },
3241
+ {
3242
+ "epoch": 6.51,
3243
+ "eval_accuracy": 0.8484126984126984,
3244
+ "eval_loss": 0.5324992537498474,
3245
+ "eval_runtime": 92.004,
3246
+ "eval_samples_per_second": 27.39,
3247
+ "eval_steps_per_second": 3.424,
3248
+ "step": 4100
3249
+ },
3250
+ {
3251
+ "epoch": 6.52,
3252
+ "grad_norm": 4.983388900756836,
3253
+ "learning_rate": 1.3741496598639456e-05,
3254
+ "loss": 0.3076,
3255
+ "step": 4110
3256
+ },
3257
+ {
3258
+ "epoch": 6.54,
3259
+ "grad_norm": 7.082801818847656,
3260
+ "learning_rate": 1.3287981859410432e-05,
3261
+ "loss": 0.4137,
3262
+ "step": 4120
3263
+ },
3264
+ {
3265
+ "epoch": 6.56,
3266
+ "grad_norm": 5.25059700012207,
3267
+ "learning_rate": 1.2834467120181407e-05,
3268
+ "loss": 0.3036,
3269
+ "step": 4130
3270
+ },
3271
+ {
3272
+ "epoch": 6.57,
3273
+ "grad_norm": 4.526573181152344,
3274
+ "learning_rate": 1.2380952380952381e-05,
3275
+ "loss": 0.3122,
3276
+ "step": 4140
3277
+ },
3278
+ {
3279
+ "epoch": 6.59,
3280
+ "grad_norm": 3.637713670730591,
3281
+ "learning_rate": 1.1927437641723357e-05,
3282
+ "loss": 0.4162,
3283
+ "step": 4150
3284
+ },
3285
+ {
3286
+ "epoch": 6.6,
3287
+ "grad_norm": 7.545217514038086,
3288
+ "learning_rate": 1.147392290249433e-05,
3289
+ "loss": 0.5283,
3290
+ "step": 4160
3291
+ },
3292
+ {
3293
+ "epoch": 6.62,
3294
+ "grad_norm": 3.526754856109619,
3295
+ "learning_rate": 1.1020408163265306e-05,
3296
+ "loss": 0.4345,
3297
+ "step": 4170
3298
+ },
3299
+ {
3300
+ "epoch": 6.63,
3301
+ "grad_norm": 5.48629093170166,
3302
+ "learning_rate": 1.056689342403628e-05,
3303
+ "loss": 0.3755,
3304
+ "step": 4180
3305
+ },
3306
+ {
3307
+ "epoch": 6.65,
3308
+ "grad_norm": 3.289555788040161,
3309
+ "learning_rate": 1.0113378684807256e-05,
3310
+ "loss": 0.3603,
3311
+ "step": 4190
3312
+ },
3313
+ {
3314
+ "epoch": 6.67,
3315
+ "grad_norm": 3.701138496398926,
3316
+ "learning_rate": 9.659863945578232e-06,
3317
+ "loss": 0.2603,
3318
+ "step": 4200
3319
+ },
3320
+ {
3321
+ "epoch": 6.67,
3322
+ "eval_accuracy": 0.8476190476190476,
3323
+ "eval_loss": 0.5360472202301025,
3324
+ "eval_runtime": 94.6291,
3325
+ "eval_samples_per_second": 26.63,
3326
+ "eval_steps_per_second": 3.329,
3327
+ "step": 4200
3328
+ },
3329
+ {
3330
+ "epoch": 6.68,
3331
+ "grad_norm": 4.533103942871094,
3332
+ "learning_rate": 9.206349206349207e-06,
3333
+ "loss": 0.3483,
3334
+ "step": 4210
3335
+ },
3336
+ {
3337
+ "epoch": 6.7,
3338
+ "grad_norm": 6.598170280456543,
3339
+ "learning_rate": 8.752834467120181e-06,
3340
+ "loss": 0.4429,
3341
+ "step": 4220
3342
+ },
3343
+ {
3344
+ "epoch": 6.71,
3345
+ "grad_norm": 3.7935194969177246,
3346
+ "learning_rate": 8.299319727891157e-06,
3347
+ "loss": 0.2892,
3348
+ "step": 4230
3349
+ },
3350
+ {
3351
+ "epoch": 6.73,
3352
+ "grad_norm": 4.123006343841553,
3353
+ "learning_rate": 7.845804988662133e-06,
3354
+ "loss": 0.3482,
3355
+ "step": 4240
3356
+ },
3357
+ {
3358
+ "epoch": 6.75,
3359
+ "grad_norm": 0.6726520657539368,
3360
+ "learning_rate": 7.392290249433107e-06,
3361
+ "loss": 0.3813,
3362
+ "step": 4250
3363
+ },
3364
+ {
3365
+ "epoch": 6.76,
3366
+ "grad_norm": 7.85457181930542,
3367
+ "learning_rate": 6.938775510204082e-06,
3368
+ "loss": 0.3737,
3369
+ "step": 4260
3370
+ },
3371
+ {
3372
+ "epoch": 6.78,
3373
+ "grad_norm": 1.485037922859192,
3374
+ "learning_rate": 6.485260770975057e-06,
3375
+ "loss": 0.2864,
3376
+ "step": 4270
3377
+ },
3378
+ {
3379
+ "epoch": 6.79,
3380
+ "grad_norm": 2.535890817642212,
3381
+ "learning_rate": 6.031746031746032e-06,
3382
+ "loss": 0.2424,
3383
+ "step": 4280
3384
+ },
3385
+ {
3386
+ "epoch": 6.81,
3387
+ "grad_norm": 1.3501815795898438,
3388
+ "learning_rate": 5.578231292517007e-06,
3389
+ "loss": 0.3013,
3390
+ "step": 4290
3391
+ },
3392
+ {
3393
+ "epoch": 6.83,
3394
+ "grad_norm": 2.961240291595459,
3395
+ "learning_rate": 5.124716553287982e-06,
3396
+ "loss": 0.3364,
3397
+ "step": 4300
3398
+ },
3399
+ {
3400
+ "epoch": 6.83,
3401
+ "eval_accuracy": 0.8496031746031746,
3402
+ "eval_loss": 0.5302808284759521,
3403
+ "eval_runtime": 99.1164,
3404
+ "eval_samples_per_second": 25.425,
3405
+ "eval_steps_per_second": 3.178,
3406
+ "step": 4300
3407
+ },
3408
+ {
3409
+ "epoch": 6.84,
3410
+ "grad_norm": 3.593552589416504,
3411
+ "learning_rate": 4.671201814058957e-06,
3412
+ "loss": 0.3273,
3413
+ "step": 4310
3414
+ },
3415
+ {
3416
+ "epoch": 6.86,
3417
+ "grad_norm": 3.5346760749816895,
3418
+ "learning_rate": 4.217687074829932e-06,
3419
+ "loss": 0.3345,
3420
+ "step": 4320
3421
+ },
3422
+ {
3423
+ "epoch": 6.87,
3424
+ "grad_norm": 3.0319371223449707,
3425
+ "learning_rate": 3.764172335600907e-06,
3426
+ "loss": 0.4083,
3427
+ "step": 4330
3428
+ },
3429
+ {
3430
+ "epoch": 6.89,
3431
+ "grad_norm": 2.639801263809204,
3432
+ "learning_rate": 3.310657596371882e-06,
3433
+ "loss": 0.2986,
3434
+ "step": 4340
3435
+ },
3436
+ {
3437
+ "epoch": 6.9,
3438
+ "grad_norm": 1.1220003366470337,
3439
+ "learning_rate": 2.8571428571428573e-06,
3440
+ "loss": 0.3088,
3441
+ "step": 4350
3442
+ },
3443
+ {
3444
+ "epoch": 6.92,
3445
+ "grad_norm": 5.006831645965576,
3446
+ "learning_rate": 2.4036281179138325e-06,
3447
+ "loss": 0.4156,
3448
+ "step": 4360
3449
+ },
3450
+ {
3451
+ "epoch": 6.94,
3452
+ "grad_norm": 4.62202262878418,
3453
+ "learning_rate": 1.9501133786848073e-06,
3454
+ "loss": 0.3888,
3455
+ "step": 4370
3456
+ },
3457
+ {
3458
+ "epoch": 6.95,
3459
+ "grad_norm": 6.1172590255737305,
3460
+ "learning_rate": 1.4965986394557823e-06,
3461
+ "loss": 0.3449,
3462
+ "step": 4380
3463
+ },
3464
+ {
3465
+ "epoch": 6.97,
3466
+ "grad_norm": 6.193305969238281,
3467
+ "learning_rate": 1.0430839002267576e-06,
3468
+ "loss": 0.3253,
3469
+ "step": 4390
3470
+ },
3471
+ {
3472
+ "epoch": 6.98,
3473
+ "grad_norm": 1.3442612886428833,
3474
+ "learning_rate": 5.895691609977325e-07,
3475
+ "loss": 0.3639,
3476
+ "step": 4400
3477
+ },
3478
+ {
3479
+ "epoch": 6.98,
3480
+ "eval_accuracy": 0.8492063492063492,
3481
+ "eval_loss": 0.5315643548965454,
3482
+ "eval_runtime": 94.3134,
3483
+ "eval_samples_per_second": 26.719,
3484
+ "eval_steps_per_second": 3.34,
3485
+ "step": 4400
3486
+ },
3487
+ {
3488
+ "epoch": 7.0,
3489
+ "grad_norm": 4.398177146911621,
3490
+ "learning_rate": 1.3605442176870747e-07,
3491
+ "loss": 0.2582,
3492
+ "step": 4410
3493
+ },
3494
+ {
3495
+ "epoch": 7.0,
3496
+ "step": 4410,
3497
+ "total_flos": 5.468471871363809e+18,
3498
+ "train_loss": 0.7062997149772384,
3499
+ "train_runtime": 5016.6762,
3500
+ "train_samples_per_second": 14.065,
3501
+ "train_steps_per_second": 0.879
3502
+ }
3503
+ ],
3504
+ "logging_steps": 10,
3505
+ "max_steps": 4410,
3506
+ "num_input_tokens_seen": 0,
3507
+ "num_train_epochs": 7,
3508
+ "save_steps": 100,
3509
+ "total_flos": 5.468471871363809e+18,
3510
+ "train_batch_size": 16,
3511
+ "trial_name": null,
3512
+ "trial_params": null
3513
+ }