rshrott commited on
Commit
d2e785d
1 Parent(s): 2382207

🍻 cheers

Browse files
README.md CHANGED
@@ -2,6 +2,7 @@
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224-in21k
4
  tags:
 
5
  - generated_from_trainer
6
  model-index:
7
  - name: ryan03282024
@@ -13,12 +14,12 @@ should probably proofread and complete it, then remove this comment. -->
13
 
14
  # ryan03282024
15
 
16
- This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the None dataset.
17
  It achieves the following results on the evaluation set:
18
- - Loss: 0.2949
19
- - Ordinal Mae: 0.3194
20
- - Ordinal Accuracy: 0.7145
21
- - Na Accuracy: 0.7876
22
 
23
  ## Model description
24
 
 
2
  license: apache-2.0
3
  base_model: google/vit-base-patch16-224-in21k
4
  tags:
5
+ - image-classification
6
  - generated_from_trainer
7
  model-index:
8
  - name: ryan03282024
 
14
 
15
  # ryan03282024
16
 
17
+ This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the properties dataset.
18
  It achieves the following results on the evaluation set:
19
+ - Loss: 0.2238
20
+ - Ordinal Mae: 0.4441
21
+ - Ordinal Accuracy: 0.6446
22
+ - Na Accuracy: 0.7992
23
 
24
  ## Model description
25
 
all_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "eval_loss": 0.22382062673568726,
4
+ "eval_na_accuracy": 0.799227774143219,
5
+ "eval_ordinal_accuracy": 0.6446113586425781,
6
+ "eval_ordinal_mae": 0.4441048502922058,
7
+ "eval_runtime": 157.4798,
8
+ "eval_samples_per_second": 25.267,
9
+ "eval_steps_per_second": 3.162,
10
+ "train_loss": 0.11642740544208273,
11
+ "train_runtime": 27830.4973,
12
+ "train_samples_per_second": 5.264,
13
+ "train_steps_per_second": 0.329
14
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "eval_loss": 0.22382062673568726,
4
+ "eval_na_accuracy": 0.799227774143219,
5
+ "eval_ordinal_accuracy": 0.6446113586425781,
6
+ "eval_ordinal_mae": 0.4441048502922058,
7
+ "eval_runtime": 157.4798,
8
+ "eval_samples_per_second": 25.267,
9
+ "eval_steps_per_second": 3.162
10
+ }
runs/Mar28_11-58-46_ryanserver/events.out.tfevents.1711669541.ryanserver.10171.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:beb3106d42fdca38428b89da366f29ddc0f087107017a62b41ab93b211ca43b3
3
+ size 529
train_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 4.0,
3
+ "train_loss": 0.11642740544208273,
4
+ "train_runtime": 27830.4973,
5
+ "train_samples_per_second": 5.264,
6
+ "train_steps_per_second": 0.329
7
+ }
trainer_state.json ADDED
@@ -0,0 +1,3593 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.22382062673568726,
3
+ "best_model_checkpoint": "./ryan03282024/checkpoint-1600",
4
+ "epoch": 4.0,
5
+ "eval_steps": 100,
6
+ "global_step": 9160,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.01,
13
+ "grad_norm": 0.5509258508682251,
14
+ "learning_rate": 9.972707423580787e-05,
15
+ "loss": 0.4095,
16
+ "step": 25
17
+ },
18
+ {
19
+ "epoch": 0.02,
20
+ "grad_norm": 0.37446415424346924,
21
+ "learning_rate": 9.945414847161573e-05,
22
+ "loss": 0.3743,
23
+ "step": 50
24
+ },
25
+ {
26
+ "epoch": 0.03,
27
+ "grad_norm": 0.36988818645477295,
28
+ "learning_rate": 9.918122270742359e-05,
29
+ "loss": 0.3766,
30
+ "step": 75
31
+ },
32
+ {
33
+ "epoch": 0.04,
34
+ "grad_norm": 2.9999992847442627,
35
+ "learning_rate": 9.890829694323145e-05,
36
+ "loss": 0.3421,
37
+ "step": 100
38
+ },
39
+ {
40
+ "epoch": 0.04,
41
+ "eval_loss": 0.33314022421836853,
42
+ "eval_na_accuracy": 0.6911196708679199,
43
+ "eval_ordinal_accuracy": 0.3816815912723541,
44
+ "eval_ordinal_mae": 0.8749264478683472,
45
+ "eval_runtime": 311.6641,
46
+ "eval_samples_per_second": 12.767,
47
+ "eval_steps_per_second": 1.598,
48
+ "step": 100
49
+ },
50
+ {
51
+ "epoch": 0.05,
52
+ "grad_norm": 0.6784111261367798,
53
+ "learning_rate": 9.863537117903931e-05,
54
+ "loss": 0.3422,
55
+ "step": 125
56
+ },
57
+ {
58
+ "epoch": 0.07,
59
+ "grad_norm": 1.7690809965133667,
60
+ "learning_rate": 9.836244541484717e-05,
61
+ "loss": 0.3059,
62
+ "step": 150
63
+ },
64
+ {
65
+ "epoch": 0.08,
66
+ "grad_norm": 0.5732172131538391,
67
+ "learning_rate": 9.808951965065503e-05,
68
+ "loss": 0.3654,
69
+ "step": 175
70
+ },
71
+ {
72
+ "epoch": 0.09,
73
+ "grad_norm": 0.3226703405380249,
74
+ "learning_rate": 9.781659388646288e-05,
75
+ "loss": 0.2813,
76
+ "step": 200
77
+ },
78
+ {
79
+ "epoch": 0.09,
80
+ "eval_loss": 0.29995909333229065,
81
+ "eval_na_accuracy": 0.7953668236732483,
82
+ "eval_ordinal_accuracy": 0.5117018222808838,
83
+ "eval_ordinal_mae": 0.7492409348487854,
84
+ "eval_runtime": 157.8026,
85
+ "eval_samples_per_second": 25.215,
86
+ "eval_steps_per_second": 3.156,
87
+ "step": 200
88
+ },
89
+ {
90
+ "epoch": 0.1,
91
+ "grad_norm": 0.44079920649528503,
92
+ "learning_rate": 9.754366812227075e-05,
93
+ "loss": 0.3359,
94
+ "step": 225
95
+ },
96
+ {
97
+ "epoch": 0.11,
98
+ "grad_norm": 1.5866132974624634,
99
+ "learning_rate": 9.727074235807861e-05,
100
+ "loss": 0.3131,
101
+ "step": 250
102
+ },
103
+ {
104
+ "epoch": 0.12,
105
+ "grad_norm": 1.5666215419769287,
106
+ "learning_rate": 9.699781659388647e-05,
107
+ "loss": 0.2723,
108
+ "step": 275
109
+ },
110
+ {
111
+ "epoch": 0.13,
112
+ "grad_norm": 0.2892788350582123,
113
+ "learning_rate": 9.672489082969433e-05,
114
+ "loss": 0.2619,
115
+ "step": 300
116
+ },
117
+ {
118
+ "epoch": 0.13,
119
+ "eval_loss": 0.3019290566444397,
120
+ "eval_na_accuracy": 0.7046331763267517,
121
+ "eval_ordinal_accuracy": 0.5273042321205139,
122
+ "eval_ordinal_mae": 0.6840940117835999,
123
+ "eval_runtime": 158.8663,
124
+ "eval_samples_per_second": 25.046,
125
+ "eval_steps_per_second": 3.135,
126
+ "step": 300
127
+ },
128
+ {
129
+ "epoch": 0.14,
130
+ "grad_norm": 0.24918608367443085,
131
+ "learning_rate": 9.645196506550219e-05,
132
+ "loss": 0.303,
133
+ "step": 325
134
+ },
135
+ {
136
+ "epoch": 0.15,
137
+ "grad_norm": 1.626022219657898,
138
+ "learning_rate": 9.617903930131005e-05,
139
+ "loss": 0.2867,
140
+ "step": 350
141
+ },
142
+ {
143
+ "epoch": 0.16,
144
+ "grad_norm": 2.1694397926330566,
145
+ "learning_rate": 9.59061135371179e-05,
146
+ "loss": 0.249,
147
+ "step": 375
148
+ },
149
+ {
150
+ "epoch": 0.17,
151
+ "grad_norm": 0.3259480893611908,
152
+ "learning_rate": 9.563318777292577e-05,
153
+ "loss": 0.2863,
154
+ "step": 400
155
+ },
156
+ {
157
+ "epoch": 0.17,
158
+ "eval_loss": 0.2960417866706848,
159
+ "eval_na_accuracy": 0.7335907220840454,
160
+ "eval_ordinal_accuracy": 0.5096792578697205,
161
+ "eval_ordinal_mae": 0.6538078188896179,
162
+ "eval_runtime": 159.697,
163
+ "eval_samples_per_second": 24.916,
164
+ "eval_steps_per_second": 3.118,
165
+ "step": 400
166
+ },
167
+ {
168
+ "epoch": 0.19,
169
+ "grad_norm": 1.1021641492843628,
170
+ "learning_rate": 9.536026200873362e-05,
171
+ "loss": 0.2729,
172
+ "step": 425
173
+ },
174
+ {
175
+ "epoch": 0.2,
176
+ "grad_norm": 1.3987232446670532,
177
+ "learning_rate": 9.50873362445415e-05,
178
+ "loss": 0.2544,
179
+ "step": 450
180
+ },
181
+ {
182
+ "epoch": 0.21,
183
+ "grad_norm": 1.7090039253234863,
184
+ "learning_rate": 9.481441048034934e-05,
185
+ "loss": 0.2776,
186
+ "step": 475
187
+ },
188
+ {
189
+ "epoch": 0.22,
190
+ "grad_norm": 1.737847924232483,
191
+ "learning_rate": 9.454148471615721e-05,
192
+ "loss": 0.2159,
193
+ "step": 500
194
+ },
195
+ {
196
+ "epoch": 0.22,
197
+ "eval_loss": 0.26019465923309326,
198
+ "eval_na_accuracy": 0.8243243098258972,
199
+ "eval_ordinal_accuracy": 0.5660213828086853,
200
+ "eval_ordinal_mae": 0.540388822555542,
201
+ "eval_runtime": 157.5356,
202
+ "eval_samples_per_second": 25.258,
203
+ "eval_steps_per_second": 3.161,
204
+ "step": 500
205
+ },
206
+ {
207
+ "epoch": 0.23,
208
+ "grad_norm": 3.2989888191223145,
209
+ "learning_rate": 9.426855895196508e-05,
210
+ "loss": 0.2554,
211
+ "step": 525
212
+ },
213
+ {
214
+ "epoch": 0.24,
215
+ "grad_norm": 0.7149298191070557,
216
+ "learning_rate": 9.399563318777294e-05,
217
+ "loss": 0.272,
218
+ "step": 550
219
+ },
220
+ {
221
+ "epoch": 0.25,
222
+ "grad_norm": 0.90256667137146,
223
+ "learning_rate": 9.37227074235808e-05,
224
+ "loss": 0.2595,
225
+ "step": 575
226
+ },
227
+ {
228
+ "epoch": 0.26,
229
+ "grad_norm": 2.2932217121124268,
230
+ "learning_rate": 9.344978165938864e-05,
231
+ "loss": 0.2235,
232
+ "step": 600
233
+ },
234
+ {
235
+ "epoch": 0.26,
236
+ "eval_loss": 0.2556995749473572,
237
+ "eval_na_accuracy": 0.7779922485351562,
238
+ "eval_ordinal_accuracy": 0.5874024629592896,
239
+ "eval_ordinal_mae": 0.5014671683311462,
240
+ "eval_runtime": 159.648,
241
+ "eval_samples_per_second": 24.924,
242
+ "eval_steps_per_second": 3.119,
243
+ "step": 600
244
+ },
245
+ {
246
+ "epoch": 0.27,
247
+ "grad_norm": 0.8714308142662048,
248
+ "learning_rate": 9.317685589519652e-05,
249
+ "loss": 0.2871,
250
+ "step": 625
251
+ },
252
+ {
253
+ "epoch": 0.28,
254
+ "grad_norm": 0.699299156665802,
255
+ "learning_rate": 9.290393013100436e-05,
256
+ "loss": 0.2489,
257
+ "step": 650
258
+ },
259
+ {
260
+ "epoch": 0.29,
261
+ "grad_norm": 3.3761134147644043,
262
+ "learning_rate": 9.263100436681224e-05,
263
+ "loss": 0.2696,
264
+ "step": 675
265
+ },
266
+ {
267
+ "epoch": 0.31,
268
+ "grad_norm": 0.5747934579849243,
269
+ "learning_rate": 9.235807860262009e-05,
270
+ "loss": 0.285,
271
+ "step": 700
272
+ },
273
+ {
274
+ "epoch": 0.31,
275
+ "eval_loss": 0.2563941180706024,
276
+ "eval_na_accuracy": 0.6853281855583191,
277
+ "eval_ordinal_accuracy": 0.6180294752120972,
278
+ "eval_ordinal_mae": 0.49999386072158813,
279
+ "eval_runtime": 160.5887,
280
+ "eval_samples_per_second": 24.778,
281
+ "eval_steps_per_second": 3.101,
282
+ "step": 700
283
+ },
284
+ {
285
+ "epoch": 0.32,
286
+ "grad_norm": 0.6936819553375244,
287
+ "learning_rate": 9.208515283842796e-05,
288
+ "loss": 0.2379,
289
+ "step": 725
290
+ },
291
+ {
292
+ "epoch": 0.33,
293
+ "grad_norm": 1.3918662071228027,
294
+ "learning_rate": 9.18122270742358e-05,
295
+ "loss": 0.2509,
296
+ "step": 750
297
+ },
298
+ {
299
+ "epoch": 0.34,
300
+ "grad_norm": 0.46456876397132874,
301
+ "learning_rate": 9.153930131004367e-05,
302
+ "loss": 0.2531,
303
+ "step": 775
304
+ },
305
+ {
306
+ "epoch": 0.35,
307
+ "grad_norm": 0.516076385974884,
308
+ "learning_rate": 9.126637554585154e-05,
309
+ "loss": 0.2028,
310
+ "step": 800
311
+ },
312
+ {
313
+ "epoch": 0.35,
314
+ "eval_loss": 0.2861529588699341,
315
+ "eval_na_accuracy": 0.7220077514648438,
316
+ "eval_ordinal_accuracy": 0.5067899227142334,
317
+ "eval_ordinal_mae": 0.633837103843689,
318
+ "eval_runtime": 154.4747,
319
+ "eval_samples_per_second": 25.758,
320
+ "eval_steps_per_second": 3.224,
321
+ "step": 800
322
+ },
323
+ {
324
+ "epoch": 0.36,
325
+ "grad_norm": 1.5322011709213257,
326
+ "learning_rate": 9.100436681222709e-05,
327
+ "loss": 0.2936,
328
+ "step": 825
329
+ },
330
+ {
331
+ "epoch": 0.37,
332
+ "grad_norm": 0.49020248651504517,
333
+ "learning_rate": 9.073144104803494e-05,
334
+ "loss": 0.2206,
335
+ "step": 850
336
+ },
337
+ {
338
+ "epoch": 0.38,
339
+ "grad_norm": 0.8753771781921387,
340
+ "learning_rate": 9.045851528384281e-05,
341
+ "loss": 0.2031,
342
+ "step": 875
343
+ },
344
+ {
345
+ "epoch": 0.39,
346
+ "grad_norm": 1.1926873922348022,
347
+ "learning_rate": 9.018558951965066e-05,
348
+ "loss": 0.2006,
349
+ "step": 900
350
+ },
351
+ {
352
+ "epoch": 0.39,
353
+ "eval_loss": 0.24949249625205994,
354
+ "eval_na_accuracy": 0.7586872577667236,
355
+ "eval_ordinal_accuracy": 0.6298757791519165,
356
+ "eval_ordinal_mae": 0.4830287992954254,
357
+ "eval_runtime": 156.0615,
358
+ "eval_samples_per_second": 25.496,
359
+ "eval_steps_per_second": 3.191,
360
+ "step": 900
361
+ },
362
+ {
363
+ "epoch": 0.4,
364
+ "grad_norm": 1.0290440320968628,
365
+ "learning_rate": 8.991266375545852e-05,
366
+ "loss": 0.2648,
367
+ "step": 925
368
+ },
369
+ {
370
+ "epoch": 0.41,
371
+ "grad_norm": 0.615397036075592,
372
+ "learning_rate": 8.963973799126638e-05,
373
+ "loss": 0.1866,
374
+ "step": 950
375
+ },
376
+ {
377
+ "epoch": 0.43,
378
+ "grad_norm": 0.4451560974121094,
379
+ "learning_rate": 8.936681222707424e-05,
380
+ "loss": 0.2641,
381
+ "step": 975
382
+ },
383
+ {
384
+ "epoch": 0.44,
385
+ "grad_norm": 0.9014913439750671,
386
+ "learning_rate": 8.90938864628821e-05,
387
+ "loss": 0.2663,
388
+ "step": 1000
389
+ },
390
+ {
391
+ "epoch": 0.44,
392
+ "eval_loss": 0.26604241132736206,
393
+ "eval_na_accuracy": 0.8610038757324219,
394
+ "eval_ordinal_accuracy": 0.602138102054596,
395
+ "eval_ordinal_mae": 0.4893138110637665,
396
+ "eval_runtime": 155.5971,
397
+ "eval_samples_per_second": 25.572,
398
+ "eval_steps_per_second": 3.201,
399
+ "step": 1000
400
+ },
401
+ {
402
+ "epoch": 0.45,
403
+ "grad_norm": 2.4818997383117676,
404
+ "learning_rate": 8.882096069868996e-05,
405
+ "loss": 0.2454,
406
+ "step": 1025
407
+ },
408
+ {
409
+ "epoch": 0.46,
410
+ "grad_norm": 0.7425001859664917,
411
+ "learning_rate": 8.854803493449782e-05,
412
+ "loss": 0.215,
413
+ "step": 1050
414
+ },
415
+ {
416
+ "epoch": 0.47,
417
+ "grad_norm": 2.0386316776275635,
418
+ "learning_rate": 8.827510917030568e-05,
419
+ "loss": 0.2368,
420
+ "step": 1075
421
+ },
422
+ {
423
+ "epoch": 0.48,
424
+ "grad_norm": 0.6596102714538574,
425
+ "learning_rate": 8.800218340611354e-05,
426
+ "loss": 0.2062,
427
+ "step": 1100
428
+ },
429
+ {
430
+ "epoch": 0.48,
431
+ "eval_loss": 0.24812151491641998,
432
+ "eval_na_accuracy": 0.8436293601989746,
433
+ "eval_ordinal_accuracy": 0.6266974806785583,
434
+ "eval_ordinal_mae": 0.47125470638275146,
435
+ "eval_runtime": 157.0377,
436
+ "eval_samples_per_second": 25.338,
437
+ "eval_steps_per_second": 3.171,
438
+ "step": 1100
439
+ },
440
+ {
441
+ "epoch": 0.49,
442
+ "grad_norm": 0.4987865686416626,
443
+ "learning_rate": 8.77292576419214e-05,
444
+ "loss": 0.2151,
445
+ "step": 1125
446
+ },
447
+ {
448
+ "epoch": 0.5,
449
+ "grad_norm": 1.0523704290390015,
450
+ "learning_rate": 8.745633187772926e-05,
451
+ "loss": 0.2092,
452
+ "step": 1150
453
+ },
454
+ {
455
+ "epoch": 0.51,
456
+ "grad_norm": 0.6211034059524536,
457
+ "learning_rate": 8.718340611353712e-05,
458
+ "loss": 0.2452,
459
+ "step": 1175
460
+ },
461
+ {
462
+ "epoch": 0.52,
463
+ "grad_norm": 0.49944034218788147,
464
+ "learning_rate": 8.691048034934498e-05,
465
+ "loss": 0.1749,
466
+ "step": 1200
467
+ },
468
+ {
469
+ "epoch": 0.52,
470
+ "eval_loss": 0.25862136483192444,
471
+ "eval_na_accuracy": 0.6737451553344727,
472
+ "eval_ordinal_accuracy": 0.6422998905181885,
473
+ "eval_ordinal_mae": 0.4958673417568207,
474
+ "eval_runtime": 161.9513,
475
+ "eval_samples_per_second": 24.569,
476
+ "eval_steps_per_second": 3.075,
477
+ "step": 1200
478
+ },
479
+ {
480
+ "epoch": 0.53,
481
+ "grad_norm": 0.30346667766571045,
482
+ "learning_rate": 8.663755458515284e-05,
483
+ "loss": 0.202,
484
+ "step": 1225
485
+ },
486
+ {
487
+ "epoch": 0.55,
488
+ "grad_norm": 0.8414286971092224,
489
+ "learning_rate": 8.63646288209607e-05,
490
+ "loss": 0.2019,
491
+ "step": 1250
492
+ },
493
+ {
494
+ "epoch": 0.56,
495
+ "grad_norm": 1.1115086078643799,
496
+ "learning_rate": 8.609170305676856e-05,
497
+ "loss": 0.2037,
498
+ "step": 1275
499
+ },
500
+ {
501
+ "epoch": 0.57,
502
+ "grad_norm": 0.7678231000900269,
503
+ "learning_rate": 8.581877729257642e-05,
504
+ "loss": 0.2197,
505
+ "step": 1300
506
+ },
507
+ {
508
+ "epoch": 0.57,
509
+ "eval_loss": 0.23488646745681763,
510
+ "eval_na_accuracy": 0.8030887842178345,
511
+ "eval_ordinal_accuracy": 0.5980930328369141,
512
+ "eval_ordinal_mae": 0.48411670327186584,
513
+ "eval_runtime": 161.4724,
514
+ "eval_samples_per_second": 24.642,
515
+ "eval_steps_per_second": 3.084,
516
+ "step": 1300
517
+ },
518
+ {
519
+ "epoch": 0.58,
520
+ "grad_norm": 0.33742988109588623,
521
+ "learning_rate": 8.554585152838429e-05,
522
+ "loss": 0.226,
523
+ "step": 1325
524
+ },
525
+ {
526
+ "epoch": 0.59,
527
+ "grad_norm": 1.6625324487686157,
528
+ "learning_rate": 8.527292576419215e-05,
529
+ "loss": 0.2612,
530
+ "step": 1350
531
+ },
532
+ {
533
+ "epoch": 0.6,
534
+ "grad_norm": 1.5984447002410889,
535
+ "learning_rate": 8.5e-05,
536
+ "loss": 0.1812,
537
+ "step": 1375
538
+ },
539
+ {
540
+ "epoch": 0.61,
541
+ "grad_norm": 1.39547860622406,
542
+ "learning_rate": 8.472707423580787e-05,
543
+ "loss": 0.2073,
544
+ "step": 1400
545
+ },
546
+ {
547
+ "epoch": 0.61,
548
+ "eval_loss": 0.25865110754966736,
549
+ "eval_na_accuracy": 0.6949806809425354,
550
+ "eval_ordinal_accuracy": 0.6012713313102722,
551
+ "eval_ordinal_mae": 0.4877532422542572,
552
+ "eval_runtime": 161.5877,
553
+ "eval_samples_per_second": 24.624,
554
+ "eval_steps_per_second": 3.082,
555
+ "step": 1400
556
+ },
557
+ {
558
+ "epoch": 0.62,
559
+ "grad_norm": 0.6241037845611572,
560
+ "learning_rate": 8.445414847161573e-05,
561
+ "loss": 0.2783,
562
+ "step": 1425
563
+ },
564
+ {
565
+ "epoch": 0.63,
566
+ "grad_norm": 1.9269381761550903,
567
+ "learning_rate": 8.418122270742357e-05,
568
+ "loss": 0.1813,
569
+ "step": 1450
570
+ },
571
+ {
572
+ "epoch": 0.64,
573
+ "grad_norm": 0.5204592943191528,
574
+ "learning_rate": 8.390829694323145e-05,
575
+ "loss": 0.1753,
576
+ "step": 1475
577
+ },
578
+ {
579
+ "epoch": 0.66,
580
+ "grad_norm": 1.156730055809021,
581
+ "learning_rate": 8.36353711790393e-05,
582
+ "loss": 0.1915,
583
+ "step": 1500
584
+ },
585
+ {
586
+ "epoch": 0.66,
587
+ "eval_loss": 0.23929761350154877,
588
+ "eval_na_accuracy": 0.7683397531509399,
589
+ "eval_ordinal_accuracy": 0.6321872472763062,
590
+ "eval_ordinal_mae": 0.4770694673061371,
591
+ "eval_runtime": 160.096,
592
+ "eval_samples_per_second": 24.854,
593
+ "eval_steps_per_second": 3.111,
594
+ "step": 1500
595
+ },
596
+ {
597
+ "epoch": 0.67,
598
+ "grad_norm": 0.31671473383903503,
599
+ "learning_rate": 8.336244541484717e-05,
600
+ "loss": 0.2555,
601
+ "step": 1525
602
+ },
603
+ {
604
+ "epoch": 0.68,
605
+ "grad_norm": 1.452971339225769,
606
+ "learning_rate": 8.308951965065503e-05,
607
+ "loss": 0.2321,
608
+ "step": 1550
609
+ },
610
+ {
611
+ "epoch": 0.69,
612
+ "grad_norm": 0.7802607417106628,
613
+ "learning_rate": 8.281659388646289e-05,
614
+ "loss": 0.2013,
615
+ "step": 1575
616
+ },
617
+ {
618
+ "epoch": 0.7,
619
+ "grad_norm": 1.238275408744812,
620
+ "learning_rate": 8.254366812227075e-05,
621
+ "loss": 0.2374,
622
+ "step": 1600
623
+ },
624
+ {
625
+ "epoch": 0.7,
626
+ "eval_loss": 0.22382062673568726,
627
+ "eval_na_accuracy": 0.799227774143219,
628
+ "eval_ordinal_accuracy": 0.6446113586425781,
629
+ "eval_ordinal_mae": 0.4441048502922058,
630
+ "eval_runtime": 161.7443,
631
+ "eval_samples_per_second": 24.601,
632
+ "eval_steps_per_second": 3.079,
633
+ "step": 1600
634
+ },
635
+ {
636
+ "epoch": 0.71,
637
+ "grad_norm": 0.8560432195663452,
638
+ "learning_rate": 8.227074235807861e-05,
639
+ "loss": 0.2055,
640
+ "step": 1625
641
+ },
642
+ {
643
+ "epoch": 0.72,
644
+ "grad_norm": 2.4070003032684326,
645
+ "learning_rate": 8.199781659388647e-05,
646
+ "loss": 0.2357,
647
+ "step": 1650
648
+ },
649
+ {
650
+ "epoch": 0.73,
651
+ "grad_norm": 2.119150161743164,
652
+ "learning_rate": 8.172489082969432e-05,
653
+ "loss": 0.2145,
654
+ "step": 1675
655
+ },
656
+ {
657
+ "epoch": 0.74,
658
+ "grad_norm": 1.8999676704406738,
659
+ "learning_rate": 8.145196506550219e-05,
660
+ "loss": 0.2278,
661
+ "step": 1700
662
+ },
663
+ {
664
+ "epoch": 0.74,
665
+ "eval_loss": 0.24530412256717682,
666
+ "eval_na_accuracy": 0.7277992367744446,
667
+ "eval_ordinal_accuracy": 0.6538572907447815,
668
+ "eval_ordinal_mae": 0.4410313367843628,
669
+ "eval_runtime": 160.7764,
670
+ "eval_samples_per_second": 24.749,
671
+ "eval_steps_per_second": 3.097,
672
+ "step": 1700
673
+ },
674
+ {
675
+ "epoch": 0.75,
676
+ "grad_norm": 1.699065923690796,
677
+ "learning_rate": 8.117903930131004e-05,
678
+ "loss": 0.2416,
679
+ "step": 1725
680
+ },
681
+ {
682
+ "epoch": 0.76,
683
+ "grad_norm": 1.4952126741409302,
684
+ "learning_rate": 8.090611353711791e-05,
685
+ "loss": 0.188,
686
+ "step": 1750
687
+ },
688
+ {
689
+ "epoch": 0.78,
690
+ "grad_norm": 0.4823841452598572,
691
+ "learning_rate": 8.063318777292576e-05,
692
+ "loss": 0.2148,
693
+ "step": 1775
694
+ },
695
+ {
696
+ "epoch": 0.79,
697
+ "grad_norm": 0.9280353784561157,
698
+ "learning_rate": 8.036026200873363e-05,
699
+ "loss": 0.2033,
700
+ "step": 1800
701
+ },
702
+ {
703
+ "epoch": 0.79,
704
+ "eval_loss": 0.22507312893867493,
705
+ "eval_na_accuracy": 0.8185328245162964,
706
+ "eval_ordinal_accuracy": 0.6298757791519165,
707
+ "eval_ordinal_mae": 0.4584101140499115,
708
+ "eval_runtime": 160.7672,
709
+ "eval_samples_per_second": 24.75,
710
+ "eval_steps_per_second": 3.098,
711
+ "step": 1800
712
+ },
713
+ {
714
+ "epoch": 0.8,
715
+ "grad_norm": 0.9829936027526855,
716
+ "learning_rate": 8.00873362445415e-05,
717
+ "loss": 0.2015,
718
+ "step": 1825
719
+ },
720
+ {
721
+ "epoch": 0.81,
722
+ "grad_norm": 0.5692235231399536,
723
+ "learning_rate": 7.981441048034934e-05,
724
+ "loss": 0.2213,
725
+ "step": 1850
726
+ },
727
+ {
728
+ "epoch": 0.82,
729
+ "grad_norm": 0.4303966164588928,
730
+ "learning_rate": 7.954148471615722e-05,
731
+ "loss": 0.2156,
732
+ "step": 1875
733
+ },
734
+ {
735
+ "epoch": 0.83,
736
+ "grad_norm": 1.4745689630508423,
737
+ "learning_rate": 7.926855895196506e-05,
738
+ "loss": 0.1843,
739
+ "step": 1900
740
+ },
741
+ {
742
+ "epoch": 0.83,
743
+ "eval_loss": 0.2280111461877823,
744
+ "eval_na_accuracy": 0.8127413392066956,
745
+ "eval_ordinal_accuracy": 0.6512568593025208,
746
+ "eval_ordinal_mae": 0.4446066617965698,
747
+ "eval_runtime": 160.0671,
748
+ "eval_samples_per_second": 24.858,
749
+ "eval_steps_per_second": 3.111,
750
+ "step": 1900
751
+ },
752
+ {
753
+ "epoch": 0.84,
754
+ "grad_norm": 2.2468037605285645,
755
+ "learning_rate": 7.899563318777294e-05,
756
+ "loss": 0.2182,
757
+ "step": 1925
758
+ },
759
+ {
760
+ "epoch": 0.85,
761
+ "grad_norm": 0.6818110346794128,
762
+ "learning_rate": 7.872270742358078e-05,
763
+ "loss": 0.1822,
764
+ "step": 1950
765
+ },
766
+ {
767
+ "epoch": 0.86,
768
+ "grad_norm": 0.802448034286499,
769
+ "learning_rate": 7.844978165938866e-05,
770
+ "loss": 0.2289,
771
+ "step": 1975
772
+ },
773
+ {
774
+ "epoch": 0.87,
775
+ "grad_norm": 0.6907253861427307,
776
+ "learning_rate": 7.81768558951965e-05,
777
+ "loss": 0.1878,
778
+ "step": 2000
779
+ },
780
+ {
781
+ "epoch": 0.87,
782
+ "eval_loss": 0.22766314446926117,
783
+ "eval_na_accuracy": 0.8127413392066956,
784
+ "eval_ordinal_accuracy": 0.6492343544960022,
785
+ "eval_ordinal_mae": 0.4454284906387329,
786
+ "eval_runtime": 159.2821,
787
+ "eval_samples_per_second": 24.981,
788
+ "eval_steps_per_second": 3.127,
789
+ "step": 2000
790
+ },
791
+ {
792
+ "epoch": 0.88,
793
+ "grad_norm": 1.1525399684906006,
794
+ "learning_rate": 7.790393013100437e-05,
795
+ "loss": 0.2146,
796
+ "step": 2025
797
+ },
798
+ {
799
+ "epoch": 0.9,
800
+ "grad_norm": 1.3308144807815552,
801
+ "learning_rate": 7.763100436681223e-05,
802
+ "loss": 0.2367,
803
+ "step": 2050
804
+ },
805
+ {
806
+ "epoch": 0.91,
807
+ "grad_norm": 1.1532173156738281,
808
+ "learning_rate": 7.735807860262009e-05,
809
+ "loss": 0.1948,
810
+ "step": 2075
811
+ },
812
+ {
813
+ "epoch": 0.92,
814
+ "grad_norm": 0.9184058904647827,
815
+ "learning_rate": 7.708515283842796e-05,
816
+ "loss": 0.2608,
817
+ "step": 2100
818
+ },
819
+ {
820
+ "epoch": 0.92,
821
+ "eval_loss": 0.23085492849349976,
822
+ "eval_na_accuracy": 0.8494208455085754,
823
+ "eval_ordinal_accuracy": 0.619185209274292,
824
+ "eval_ordinal_mae": 0.4517284035682678,
825
+ "eval_runtime": 158.9549,
826
+ "eval_samples_per_second": 25.032,
827
+ "eval_steps_per_second": 3.133,
828
+ "step": 2100
829
+ },
830
+ {
831
+ "epoch": 0.93,
832
+ "grad_norm": 1.0736620426177979,
833
+ "learning_rate": 7.681222707423581e-05,
834
+ "loss": 0.2409,
835
+ "step": 2125
836
+ },
837
+ {
838
+ "epoch": 0.94,
839
+ "grad_norm": 0.5520908832550049,
840
+ "learning_rate": 7.653930131004368e-05,
841
+ "loss": 0.1722,
842
+ "step": 2150
843
+ },
844
+ {
845
+ "epoch": 0.95,
846
+ "grad_norm": 1.255903720855713,
847
+ "learning_rate": 7.626637554585153e-05,
848
+ "loss": 0.1996,
849
+ "step": 2175
850
+ },
851
+ {
852
+ "epoch": 0.96,
853
+ "grad_norm": 1.3203591108322144,
854
+ "learning_rate": 7.599344978165939e-05,
855
+ "loss": 0.201,
856
+ "step": 2200
857
+ },
858
+ {
859
+ "epoch": 0.96,
860
+ "eval_loss": 0.24588599801063538,
861
+ "eval_na_accuracy": 0.7277992367744446,
862
+ "eval_ordinal_accuracy": 0.6405662894248962,
863
+ "eval_ordinal_mae": 0.46535709500312805,
864
+ "eval_runtime": 163.7913,
865
+ "eval_samples_per_second": 24.293,
866
+ "eval_steps_per_second": 3.04,
867
+ "step": 2200
868
+ },
869
+ {
870
+ "epoch": 0.97,
871
+ "grad_norm": 1.3525117635726929,
872
+ "learning_rate": 7.572052401746725e-05,
873
+ "loss": 0.2068,
874
+ "step": 2225
875
+ },
876
+ {
877
+ "epoch": 0.98,
878
+ "grad_norm": 2.916431188583374,
879
+ "learning_rate": 7.544759825327511e-05,
880
+ "loss": 0.243,
881
+ "step": 2250
882
+ },
883
+ {
884
+ "epoch": 0.99,
885
+ "grad_norm": 0.8391708135604858,
886
+ "learning_rate": 7.517467248908297e-05,
887
+ "loss": 0.1842,
888
+ "step": 2275
889
+ },
890
+ {
891
+ "epoch": 1.0,
892
+ "grad_norm": 0.5292081236839294,
893
+ "learning_rate": 7.490174672489083e-05,
894
+ "loss": 0.1736,
895
+ "step": 2300
896
+ },
897
+ {
898
+ "epoch": 1.0,
899
+ "eval_loss": 0.24380208551883698,
900
+ "eval_na_accuracy": 0.7200772166252136,
901
+ "eval_ordinal_accuracy": 0.6475006937980652,
902
+ "eval_ordinal_mae": 0.44738492369651794,
903
+ "eval_runtime": 161.0768,
904
+ "eval_samples_per_second": 24.703,
905
+ "eval_steps_per_second": 3.092,
906
+ "step": 2300
907
+ },
908
+ {
909
+ "epoch": 1.02,
910
+ "grad_norm": 0.6971523761749268,
911
+ "learning_rate": 7.462882096069869e-05,
912
+ "loss": 0.1683,
913
+ "step": 2325
914
+ },
915
+ {
916
+ "epoch": 1.03,
917
+ "grad_norm": 0.8325093388557434,
918
+ "learning_rate": 7.435589519650655e-05,
919
+ "loss": 0.1177,
920
+ "step": 2350
921
+ },
922
+ {
923
+ "epoch": 1.04,
924
+ "grad_norm": 0.8595998883247375,
925
+ "learning_rate": 7.408296943231441e-05,
926
+ "loss": 0.1626,
927
+ "step": 2375
928
+ },
929
+ {
930
+ "epoch": 1.05,
931
+ "grad_norm": 0.38421839475631714,
932
+ "learning_rate": 7.381004366812227e-05,
933
+ "loss": 0.1374,
934
+ "step": 2400
935
+ },
936
+ {
937
+ "epoch": 1.05,
938
+ "eval_loss": 0.23675759136676788,
939
+ "eval_na_accuracy": 0.7799227833747864,
940
+ "eval_ordinal_accuracy": 0.6622363328933716,
941
+ "eval_ordinal_mae": 0.41446253657341003,
942
+ "eval_runtime": 160.4437,
943
+ "eval_samples_per_second": 24.8,
944
+ "eval_steps_per_second": 3.104,
945
+ "step": 2400
946
+ },
947
+ {
948
+ "epoch": 1.06,
949
+ "grad_norm": 0.7025877237319946,
950
+ "learning_rate": 7.353711790393013e-05,
951
+ "loss": 0.1484,
952
+ "step": 2425
953
+ },
954
+ {
955
+ "epoch": 1.07,
956
+ "grad_norm": 1.4461692571640015,
957
+ "learning_rate": 7.3264192139738e-05,
958
+ "loss": 0.1564,
959
+ "step": 2450
960
+ },
961
+ {
962
+ "epoch": 1.08,
963
+ "grad_norm": 1.1262603998184204,
964
+ "learning_rate": 7.299126637554585e-05,
965
+ "loss": 0.1252,
966
+ "step": 2475
967
+ },
968
+ {
969
+ "epoch": 1.09,
970
+ "grad_norm": 1.1054977178573608,
971
+ "learning_rate": 7.271834061135371e-05,
972
+ "loss": 0.1334,
973
+ "step": 2500
974
+ },
975
+ {
976
+ "epoch": 1.09,
977
+ "eval_loss": 0.2424485832452774,
978
+ "eval_na_accuracy": 0.7509652376174927,
979
+ "eval_ordinal_accuracy": 0.6732158064842224,
980
+ "eval_ordinal_mae": 0.4105488359928131,
981
+ "eval_runtime": 161.6129,
982
+ "eval_samples_per_second": 24.621,
983
+ "eval_steps_per_second": 3.081,
984
+ "step": 2500
985
+ },
986
+ {
987
+ "epoch": 1.1,
988
+ "grad_norm": 0.6302788257598877,
989
+ "learning_rate": 7.244541484716158e-05,
990
+ "loss": 0.1252,
991
+ "step": 2525
992
+ },
993
+ {
994
+ "epoch": 1.11,
995
+ "grad_norm": 3.8645241260528564,
996
+ "learning_rate": 7.217248908296944e-05,
997
+ "loss": 0.1444,
998
+ "step": 2550
999
+ },
1000
+ {
1001
+ "epoch": 1.12,
1002
+ "grad_norm": 0.7108765244483948,
1003
+ "learning_rate": 7.18995633187773e-05,
1004
+ "loss": 0.1273,
1005
+ "step": 2575
1006
+ },
1007
+ {
1008
+ "epoch": 1.14,
1009
+ "grad_norm": 2.3784756660461426,
1010
+ "learning_rate": 7.162663755458516e-05,
1011
+ "loss": 0.1319,
1012
+ "step": 2600
1013
+ },
1014
+ {
1015
+ "epoch": 1.14,
1016
+ "eval_loss": 0.23355962336063385,
1017
+ "eval_na_accuracy": 0.7741312980651855,
1018
+ "eval_ordinal_accuracy": 0.6711933016777039,
1019
+ "eval_ordinal_mae": 0.41552796959877014,
1020
+ "eval_runtime": 161.0033,
1021
+ "eval_samples_per_second": 24.714,
1022
+ "eval_steps_per_second": 3.093,
1023
+ "step": 2600
1024
+ },
1025
+ {
1026
+ "epoch": 1.15,
1027
+ "grad_norm": 0.6569415330886841,
1028
+ "learning_rate": 7.135371179039302e-05,
1029
+ "loss": 0.1418,
1030
+ "step": 2625
1031
+ },
1032
+ {
1033
+ "epoch": 1.16,
1034
+ "grad_norm": 2.9414520263671875,
1035
+ "learning_rate": 7.108078602620088e-05,
1036
+ "loss": 0.123,
1037
+ "step": 2650
1038
+ },
1039
+ {
1040
+ "epoch": 1.17,
1041
+ "grad_norm": 0.5435983538627625,
1042
+ "learning_rate": 7.080786026200874e-05,
1043
+ "loss": 0.1458,
1044
+ "step": 2675
1045
+ },
1046
+ {
1047
+ "epoch": 1.18,
1048
+ "grad_norm": 0.6094673275947571,
1049
+ "learning_rate": 7.05349344978166e-05,
1050
+ "loss": 0.1549,
1051
+ "step": 2700
1052
+ },
1053
+ {
1054
+ "epoch": 1.18,
1055
+ "eval_loss": 0.25251126289367676,
1056
+ "eval_na_accuracy": 0.7586872577667236,
1057
+ "eval_ordinal_accuracy": 0.6625252962112427,
1058
+ "eval_ordinal_mae": 0.40396779775619507,
1059
+ "eval_runtime": 162.435,
1060
+ "eval_samples_per_second": 24.496,
1061
+ "eval_steps_per_second": 3.066,
1062
+ "step": 2700
1063
+ },
1064
+ {
1065
+ "epoch": 1.19,
1066
+ "grad_norm": 0.4580962359905243,
1067
+ "learning_rate": 7.026200873362446e-05,
1068
+ "loss": 0.1382,
1069
+ "step": 2725
1070
+ },
1071
+ {
1072
+ "epoch": 1.2,
1073
+ "grad_norm": 0.7736852765083313,
1074
+ "learning_rate": 6.998908296943232e-05,
1075
+ "loss": 0.1595,
1076
+ "step": 2750
1077
+ },
1078
+ {
1079
+ "epoch": 1.21,
1080
+ "grad_norm": 1.1125404834747314,
1081
+ "learning_rate": 6.971615720524018e-05,
1082
+ "loss": 0.1071,
1083
+ "step": 2775
1084
+ },
1085
+ {
1086
+ "epoch": 1.22,
1087
+ "grad_norm": 0.529449999332428,
1088
+ "learning_rate": 6.944323144104804e-05,
1089
+ "loss": 0.116,
1090
+ "step": 2800
1091
+ },
1092
+ {
1093
+ "epoch": 1.22,
1094
+ "eval_loss": 0.25007495284080505,
1095
+ "eval_na_accuracy": 0.7664092779159546,
1096
+ "eval_ordinal_accuracy": 0.6370990872383118,
1097
+ "eval_ordinal_mae": 0.44249165058135986,
1098
+ "eval_runtime": 161.6127,
1099
+ "eval_samples_per_second": 24.621,
1100
+ "eval_steps_per_second": 3.081,
1101
+ "step": 2800
1102
+ },
1103
+ {
1104
+ "epoch": 1.23,
1105
+ "grad_norm": 3.3976492881774902,
1106
+ "learning_rate": 6.91703056768559e-05,
1107
+ "loss": 0.1238,
1108
+ "step": 2825
1109
+ },
1110
+ {
1111
+ "epoch": 1.24,
1112
+ "grad_norm": 0.9712594747543335,
1113
+ "learning_rate": 6.889737991266376e-05,
1114
+ "loss": 0.1313,
1115
+ "step": 2850
1116
+ },
1117
+ {
1118
+ "epoch": 1.26,
1119
+ "grad_norm": 0.35930949449539185,
1120
+ "learning_rate": 6.862445414847162e-05,
1121
+ "loss": 0.1228,
1122
+ "step": 2875
1123
+ },
1124
+ {
1125
+ "epoch": 1.27,
1126
+ "grad_norm": 1.873953938484192,
1127
+ "learning_rate": 6.835152838427948e-05,
1128
+ "loss": 0.1358,
1129
+ "step": 2900
1130
+ },
1131
+ {
1132
+ "epoch": 1.27,
1133
+ "eval_loss": 0.23235850036144257,
1134
+ "eval_na_accuracy": 0.8185328245162964,
1135
+ "eval_ordinal_accuracy": 0.6498122215270996,
1136
+ "eval_ordinal_mae": 0.4136166572570801,
1137
+ "eval_runtime": 162.283,
1138
+ "eval_samples_per_second": 24.519,
1139
+ "eval_steps_per_second": 3.069,
1140
+ "step": 2900
1141
+ },
1142
+ {
1143
+ "epoch": 1.28,
1144
+ "grad_norm": 1.8601669073104858,
1145
+ "learning_rate": 6.807860262008734e-05,
1146
+ "loss": 0.101,
1147
+ "step": 2925
1148
+ },
1149
+ {
1150
+ "epoch": 1.29,
1151
+ "grad_norm": 0.9282914996147156,
1152
+ "learning_rate": 6.780567685589519e-05,
1153
+ "loss": 0.1435,
1154
+ "step": 2950
1155
+ },
1156
+ {
1157
+ "epoch": 1.3,
1158
+ "grad_norm": 1.7728241682052612,
1159
+ "learning_rate": 6.753275109170306e-05,
1160
+ "loss": 0.1125,
1161
+ "step": 2975
1162
+ },
1163
+ {
1164
+ "epoch": 1.31,
1165
+ "grad_norm": 0.5749986171722412,
1166
+ "learning_rate": 6.725982532751091e-05,
1167
+ "loss": 0.1614,
1168
+ "step": 3000
1169
+ },
1170
+ {
1171
+ "epoch": 1.31,
1172
+ "eval_loss": 0.26365962624549866,
1173
+ "eval_na_accuracy": 0.7915058135986328,
1174
+ "eval_ordinal_accuracy": 0.6316093802452087,
1175
+ "eval_ordinal_mae": 0.43529626727104187,
1176
+ "eval_runtime": 161.4837,
1177
+ "eval_samples_per_second": 24.64,
1178
+ "eval_steps_per_second": 3.084,
1179
+ "step": 3000
1180
+ },
1181
+ {
1182
+ "epoch": 1.32,
1183
+ "grad_norm": 0.44815096259117126,
1184
+ "learning_rate": 6.698689956331879e-05,
1185
+ "loss": 0.1436,
1186
+ "step": 3025
1187
+ },
1188
+ {
1189
+ "epoch": 1.33,
1190
+ "grad_norm": 0.4672500491142273,
1191
+ "learning_rate": 6.671397379912665e-05,
1192
+ "loss": 0.0943,
1193
+ "step": 3050
1194
+ },
1195
+ {
1196
+ "epoch": 1.34,
1197
+ "grad_norm": 0.8902660608291626,
1198
+ "learning_rate": 6.64410480349345e-05,
1199
+ "loss": 0.1258,
1200
+ "step": 3075
1201
+ },
1202
+ {
1203
+ "epoch": 1.35,
1204
+ "grad_norm": 0.7342121005058289,
1205
+ "learning_rate": 6.616812227074237e-05,
1206
+ "loss": 0.1395,
1207
+ "step": 3100
1208
+ },
1209
+ {
1210
+ "epoch": 1.35,
1211
+ "eval_loss": 0.2445780336856842,
1212
+ "eval_na_accuracy": 0.8011583089828491,
1213
+ "eval_ordinal_accuracy": 0.672637939453125,
1214
+ "eval_ordinal_mae": 0.4019619822502136,
1215
+ "eval_runtime": 161.5727,
1216
+ "eval_samples_per_second": 24.627,
1217
+ "eval_steps_per_second": 3.082,
1218
+ "step": 3100
1219
+ },
1220
+ {
1221
+ "epoch": 1.36,
1222
+ "grad_norm": 0.7066202163696289,
1223
+ "learning_rate": 6.589519650655021e-05,
1224
+ "loss": 0.1627,
1225
+ "step": 3125
1226
+ },
1227
+ {
1228
+ "epoch": 1.38,
1229
+ "grad_norm": 0.8218971490859985,
1230
+ "learning_rate": 6.562227074235809e-05,
1231
+ "loss": 0.1116,
1232
+ "step": 3150
1233
+ },
1234
+ {
1235
+ "epoch": 1.39,
1236
+ "grad_norm": 2.74863600730896,
1237
+ "learning_rate": 6.534934497816593e-05,
1238
+ "loss": 0.1151,
1239
+ "step": 3175
1240
+ },
1241
+ {
1242
+ "epoch": 1.4,
1243
+ "grad_norm": 2.341121196746826,
1244
+ "learning_rate": 6.507641921397381e-05,
1245
+ "loss": 0.1208,
1246
+ "step": 3200
1247
+ },
1248
+ {
1249
+ "epoch": 1.4,
1250
+ "eval_loss": 0.24651078879833221,
1251
+ "eval_na_accuracy": 0.8243243098258972,
1252
+ "eval_ordinal_accuracy": 0.6763941049575806,
1253
+ "eval_ordinal_mae": 0.394586980342865,
1254
+ "eval_runtime": 160.9965,
1255
+ "eval_samples_per_second": 24.715,
1256
+ "eval_steps_per_second": 3.093,
1257
+ "step": 3200
1258
+ },
1259
+ {
1260
+ "epoch": 1.41,
1261
+ "grad_norm": 0.610958993434906,
1262
+ "learning_rate": 6.480349344978166e-05,
1263
+ "loss": 0.1145,
1264
+ "step": 3225
1265
+ },
1266
+ {
1267
+ "epoch": 1.42,
1268
+ "grad_norm": 0.43066203594207764,
1269
+ "learning_rate": 6.453056768558953e-05,
1270
+ "loss": 0.1322,
1271
+ "step": 3250
1272
+ },
1273
+ {
1274
+ "epoch": 1.43,
1275
+ "grad_norm": 0.21925854682922363,
1276
+ "learning_rate": 6.425764192139738e-05,
1277
+ "loss": 0.1602,
1278
+ "step": 3275
1279
+ },
1280
+ {
1281
+ "epoch": 1.44,
1282
+ "grad_norm": 0.34638360142707825,
1283
+ "learning_rate": 6.398471615720524e-05,
1284
+ "loss": 0.1432,
1285
+ "step": 3300
1286
+ },
1287
+ {
1288
+ "epoch": 1.44,
1289
+ "eval_loss": 0.25519701838493347,
1290
+ "eval_na_accuracy": 0.8899613618850708,
1291
+ "eval_ordinal_accuracy": 0.6576133966445923,
1292
+ "eval_ordinal_mae": 0.3918676972389221,
1293
+ "eval_runtime": 160.3951,
1294
+ "eval_samples_per_second": 24.807,
1295
+ "eval_steps_per_second": 3.105,
1296
+ "step": 3300
1297
+ },
1298
+ {
1299
+ "epoch": 1.45,
1300
+ "grad_norm": 0.5949413776397705,
1301
+ "learning_rate": 6.371179039301311e-05,
1302
+ "loss": 0.1249,
1303
+ "step": 3325
1304
+ },
1305
+ {
1306
+ "epoch": 1.46,
1307
+ "grad_norm": 0.8993425965309143,
1308
+ "learning_rate": 6.343886462882096e-05,
1309
+ "loss": 0.1139,
1310
+ "step": 3350
1311
+ },
1312
+ {
1313
+ "epoch": 1.47,
1314
+ "grad_norm": 0.7099699974060059,
1315
+ "learning_rate": 6.316593886462883e-05,
1316
+ "loss": 0.1019,
1317
+ "step": 3375
1318
+ },
1319
+ {
1320
+ "epoch": 1.48,
1321
+ "grad_norm": 2.9975674152374268,
1322
+ "learning_rate": 6.289301310043668e-05,
1323
+ "loss": 0.1358,
1324
+ "step": 3400
1325
+ },
1326
+ {
1327
+ "epoch": 1.48,
1328
+ "eval_loss": 0.2561098039150238,
1329
+ "eval_na_accuracy": 0.7895752787590027,
1330
+ "eval_ordinal_accuracy": 0.6795724034309387,
1331
+ "eval_ordinal_mae": 0.39841172099113464,
1332
+ "eval_runtime": 162.3278,
1333
+ "eval_samples_per_second": 24.512,
1334
+ "eval_steps_per_second": 3.068,
1335
+ "step": 3400
1336
+ },
1337
+ {
1338
+ "epoch": 1.5,
1339
+ "grad_norm": 0.259729266166687,
1340
+ "learning_rate": 6.262008733624455e-05,
1341
+ "loss": 0.1486,
1342
+ "step": 3425
1343
+ },
1344
+ {
1345
+ "epoch": 1.51,
1346
+ "grad_norm": 4.243904113769531,
1347
+ "learning_rate": 6.23471615720524e-05,
1348
+ "loss": 0.1652,
1349
+ "step": 3450
1350
+ },
1351
+ {
1352
+ "epoch": 1.52,
1353
+ "grad_norm": 2.9280548095703125,
1354
+ "learning_rate": 6.207423580786027e-05,
1355
+ "loss": 0.1699,
1356
+ "step": 3475
1357
+ },
1358
+ {
1359
+ "epoch": 1.53,
1360
+ "grad_norm": 0.5303541421890259,
1361
+ "learning_rate": 6.180131004366812e-05,
1362
+ "loss": 0.0877,
1363
+ "step": 3500
1364
+ },
1365
+ {
1366
+ "epoch": 1.53,
1367
+ "eval_loss": 0.23811650276184082,
1368
+ "eval_na_accuracy": 0.7876448035240173,
1369
+ "eval_ordinal_accuracy": 0.6821727752685547,
1370
+ "eval_ordinal_mae": 0.3901168704032898,
1371
+ "eval_runtime": 162.0532,
1372
+ "eval_samples_per_second": 24.554,
1373
+ "eval_steps_per_second": 3.073,
1374
+ "step": 3500
1375
+ },
1376
+ {
1377
+ "epoch": 1.54,
1378
+ "grad_norm": 0.42129185795783997,
1379
+ "learning_rate": 6.152838427947598e-05,
1380
+ "loss": 0.1149,
1381
+ "step": 3525
1382
+ },
1383
+ {
1384
+ "epoch": 1.55,
1385
+ "grad_norm": 1.27903151512146,
1386
+ "learning_rate": 6.125545851528384e-05,
1387
+ "loss": 0.1208,
1388
+ "step": 3550
1389
+ },
1390
+ {
1391
+ "epoch": 1.56,
1392
+ "grad_norm": 3.1208670139312744,
1393
+ "learning_rate": 6.09825327510917e-05,
1394
+ "loss": 0.1106,
1395
+ "step": 3575
1396
+ },
1397
+ {
1398
+ "epoch": 1.57,
1399
+ "grad_norm": 3.3159916400909424,
1400
+ "learning_rate": 6.070960698689957e-05,
1401
+ "loss": 0.1212,
1402
+ "step": 3600
1403
+ },
1404
+ {
1405
+ "epoch": 1.57,
1406
+ "eval_loss": 0.2600151598453522,
1407
+ "eval_na_accuracy": 0.7258687019348145,
1408
+ "eval_ordinal_accuracy": 0.6948858499526978,
1409
+ "eval_ordinal_mae": 0.400073766708374,
1410
+ "eval_runtime": 160.7378,
1411
+ "eval_samples_per_second": 24.755,
1412
+ "eval_steps_per_second": 3.098,
1413
+ "step": 3600
1414
+ },
1415
+ {
1416
+ "epoch": 1.58,
1417
+ "grad_norm": 3.3880019187927246,
1418
+ "learning_rate": 6.043668122270742e-05,
1419
+ "loss": 0.1593,
1420
+ "step": 3625
1421
+ },
1422
+ {
1423
+ "epoch": 1.59,
1424
+ "grad_norm": 2.641679286956787,
1425
+ "learning_rate": 6.016375545851529e-05,
1426
+ "loss": 0.1489,
1427
+ "step": 3650
1428
+ },
1429
+ {
1430
+ "epoch": 1.6,
1431
+ "grad_norm": 1.1284505128860474,
1432
+ "learning_rate": 5.9890829694323144e-05,
1433
+ "loss": 0.1097,
1434
+ "step": 3675
1435
+ },
1436
+ {
1437
+ "epoch": 1.62,
1438
+ "grad_norm": 1.6684277057647705,
1439
+ "learning_rate": 5.9617903930131005e-05,
1440
+ "loss": 0.1917,
1441
+ "step": 3700
1442
+ },
1443
+ {
1444
+ "epoch": 1.62,
1445
+ "eval_loss": 0.24585247039794922,
1446
+ "eval_na_accuracy": 0.7818532586097717,
1447
+ "eval_ordinal_accuracy": 0.6893961429595947,
1448
+ "eval_ordinal_mae": 0.3889385461807251,
1449
+ "eval_runtime": 161.3514,
1450
+ "eval_samples_per_second": 24.66,
1451
+ "eval_steps_per_second": 3.086,
1452
+ "step": 3700
1453
+ },
1454
+ {
1455
+ "epoch": 1.63,
1456
+ "grad_norm": 0.4128411114215851,
1457
+ "learning_rate": 5.934497816593887e-05,
1458
+ "loss": 0.1423,
1459
+ "step": 3725
1460
+ },
1461
+ {
1462
+ "epoch": 1.64,
1463
+ "grad_norm": 1.0505822896957397,
1464
+ "learning_rate": 5.9072052401746726e-05,
1465
+ "loss": 0.1257,
1466
+ "step": 3750
1467
+ },
1468
+ {
1469
+ "epoch": 1.65,
1470
+ "grad_norm": 0.8468612432479858,
1471
+ "learning_rate": 5.879912663755459e-05,
1472
+ "loss": 0.1296,
1473
+ "step": 3775
1474
+ },
1475
+ {
1476
+ "epoch": 1.66,
1477
+ "grad_norm": 0.31584060192108154,
1478
+ "learning_rate": 5.852620087336245e-05,
1479
+ "loss": 0.1175,
1480
+ "step": 3800
1481
+ },
1482
+ {
1483
+ "epoch": 1.66,
1484
+ "eval_loss": 0.2443784475326538,
1485
+ "eval_na_accuracy": 0.7741312980651855,
1486
+ "eval_ordinal_accuracy": 0.6818838715553284,
1487
+ "eval_ordinal_mae": 0.3937167227268219,
1488
+ "eval_runtime": 161.5524,
1489
+ "eval_samples_per_second": 24.63,
1490
+ "eval_steps_per_second": 3.083,
1491
+ "step": 3800
1492
+ },
1493
+ {
1494
+ "epoch": 1.67,
1495
+ "grad_norm": 5.336325645446777,
1496
+ "learning_rate": 5.8253275109170314e-05,
1497
+ "loss": 0.1638,
1498
+ "step": 3825
1499
+ },
1500
+ {
1501
+ "epoch": 1.68,
1502
+ "grad_norm": 1.6800111532211304,
1503
+ "learning_rate": 5.798034934497817e-05,
1504
+ "loss": 0.1314,
1505
+ "step": 3850
1506
+ },
1507
+ {
1508
+ "epoch": 1.69,
1509
+ "grad_norm": 1.491882085800171,
1510
+ "learning_rate": 5.770742358078602e-05,
1511
+ "loss": 0.1415,
1512
+ "step": 3875
1513
+ },
1514
+ {
1515
+ "epoch": 1.7,
1516
+ "grad_norm": 2.401737928390503,
1517
+ "learning_rate": 5.743449781659389e-05,
1518
+ "loss": 0.1522,
1519
+ "step": 3900
1520
+ },
1521
+ {
1522
+ "epoch": 1.7,
1523
+ "eval_loss": 0.2472807914018631,
1524
+ "eval_na_accuracy": 0.8050193190574646,
1525
+ "eval_ordinal_accuracy": 0.6607916951179504,
1526
+ "eval_ordinal_mae": 0.40097710490226746,
1527
+ "eval_runtime": 160.8219,
1528
+ "eval_samples_per_second": 24.742,
1529
+ "eval_steps_per_second": 3.097,
1530
+ "step": 3900
1531
+ },
1532
+ {
1533
+ "epoch": 1.71,
1534
+ "grad_norm": 0.6720189452171326,
1535
+ "learning_rate": 5.716157205240175e-05,
1536
+ "loss": 0.1239,
1537
+ "step": 3925
1538
+ },
1539
+ {
1540
+ "epoch": 1.72,
1541
+ "grad_norm": 2.349804639816284,
1542
+ "learning_rate": 5.688864628820961e-05,
1543
+ "loss": 0.1318,
1544
+ "step": 3950
1545
+ },
1546
+ {
1547
+ "epoch": 1.74,
1548
+ "grad_norm": 0.732524573802948,
1549
+ "learning_rate": 5.661572052401747e-05,
1550
+ "loss": 0.1423,
1551
+ "step": 3975
1552
+ },
1553
+ {
1554
+ "epoch": 1.75,
1555
+ "grad_norm": 0.4833851456642151,
1556
+ "learning_rate": 5.634279475982534e-05,
1557
+ "loss": 0.1027,
1558
+ "step": 4000
1559
+ },
1560
+ {
1561
+ "epoch": 1.75,
1562
+ "eval_loss": 0.23541530966758728,
1563
+ "eval_na_accuracy": 0.7837837934494019,
1564
+ "eval_ordinal_accuracy": 0.6477896571159363,
1565
+ "eval_ordinal_mae": 0.420841783285141,
1566
+ "eval_runtime": 155.1162,
1567
+ "eval_samples_per_second": 25.652,
1568
+ "eval_steps_per_second": 3.21,
1569
+ "step": 4000
1570
+ },
1571
+ {
1572
+ "epoch": 1.76,
1573
+ "grad_norm": 0.5297548174858093,
1574
+ "learning_rate": 5.606986899563319e-05,
1575
+ "loss": 0.1596,
1576
+ "step": 4025
1577
+ },
1578
+ {
1579
+ "epoch": 1.77,
1580
+ "grad_norm": 0.5475759506225586,
1581
+ "learning_rate": 5.5796943231441045e-05,
1582
+ "loss": 0.1272,
1583
+ "step": 4050
1584
+ },
1585
+ {
1586
+ "epoch": 1.78,
1587
+ "grad_norm": 2.1666433811187744,
1588
+ "learning_rate": 5.552401746724891e-05,
1589
+ "loss": 0.1382,
1590
+ "step": 4075
1591
+ },
1592
+ {
1593
+ "epoch": 1.79,
1594
+ "grad_norm": 3.707628011703491,
1595
+ "learning_rate": 5.5251091703056766e-05,
1596
+ "loss": 0.1343,
1597
+ "step": 4100
1598
+ },
1599
+ {
1600
+ "epoch": 1.79,
1601
+ "eval_loss": 0.228408545255661,
1602
+ "eval_na_accuracy": 0.799227774143219,
1603
+ "eval_ordinal_accuracy": 0.6743715405464172,
1604
+ "eval_ordinal_mae": 0.3976960778236389,
1605
+ "eval_runtime": 158.2657,
1606
+ "eval_samples_per_second": 25.141,
1607
+ "eval_steps_per_second": 3.147,
1608
+ "step": 4100
1609
+ },
1610
+ {
1611
+ "epoch": 1.8,
1612
+ "grad_norm": 0.5081536769866943,
1613
+ "learning_rate": 5.497816593886463e-05,
1614
+ "loss": 0.1381,
1615
+ "step": 4125
1616
+ },
1617
+ {
1618
+ "epoch": 1.81,
1619
+ "grad_norm": 0.7783900499343872,
1620
+ "learning_rate": 5.470524017467249e-05,
1621
+ "loss": 0.1391,
1622
+ "step": 4150
1623
+ },
1624
+ {
1625
+ "epoch": 1.82,
1626
+ "grad_norm": 0.6352062821388245,
1627
+ "learning_rate": 5.4432314410480354e-05,
1628
+ "loss": 0.1157,
1629
+ "step": 4175
1630
+ },
1631
+ {
1632
+ "epoch": 1.83,
1633
+ "grad_norm": 0.6280900835990906,
1634
+ "learning_rate": 5.4159388646288215e-05,
1635
+ "loss": 0.1552,
1636
+ "step": 4200
1637
+ },
1638
+ {
1639
+ "epoch": 1.83,
1640
+ "eval_loss": 0.2606957256793976,
1641
+ "eval_na_accuracy": 0.7779922485351562,
1642
+ "eval_ordinal_accuracy": 0.6714822053909302,
1643
+ "eval_ordinal_mae": 0.4044625461101532,
1644
+ "eval_runtime": 154.4211,
1645
+ "eval_samples_per_second": 25.767,
1646
+ "eval_steps_per_second": 3.225,
1647
+ "step": 4200
1648
+ },
1649
+ {
1650
+ "epoch": 1.84,
1651
+ "grad_norm": 1.6340936422348022,
1652
+ "learning_rate": 5.388646288209607e-05,
1653
+ "loss": 0.1448,
1654
+ "step": 4225
1655
+ },
1656
+ {
1657
+ "epoch": 1.86,
1658
+ "grad_norm": 3.3546087741851807,
1659
+ "learning_rate": 5.3613537117903936e-05,
1660
+ "loss": 0.1485,
1661
+ "step": 4250
1662
+ },
1663
+ {
1664
+ "epoch": 1.87,
1665
+ "grad_norm": 0.5650043487548828,
1666
+ "learning_rate": 5.334061135371179e-05,
1667
+ "loss": 0.127,
1668
+ "step": 4275
1669
+ },
1670
+ {
1671
+ "epoch": 1.88,
1672
+ "grad_norm": 0.8098490834236145,
1673
+ "learning_rate": 5.306768558951966e-05,
1674
+ "loss": 0.1172,
1675
+ "step": 4300
1676
+ },
1677
+ {
1678
+ "epoch": 1.88,
1679
+ "eval_loss": 0.24209196865558624,
1680
+ "eval_na_accuracy": 0.8281853199005127,
1681
+ "eval_ordinal_accuracy": 0.6665703654289246,
1682
+ "eval_ordinal_mae": 0.3971348702907562,
1683
+ "eval_runtime": 153.4086,
1684
+ "eval_samples_per_second": 25.937,
1685
+ "eval_steps_per_second": 3.246,
1686
+ "step": 4300
1687
+ },
1688
+ {
1689
+ "epoch": 1.89,
1690
+ "grad_norm": 1.127596378326416,
1691
+ "learning_rate": 5.2805676855895205e-05,
1692
+ "loss": 0.1808,
1693
+ "step": 4325
1694
+ },
1695
+ {
1696
+ "epoch": 1.9,
1697
+ "grad_norm": 0.5557155609130859,
1698
+ "learning_rate": 5.253275109170306e-05,
1699
+ "loss": 0.09,
1700
+ "step": 4350
1701
+ },
1702
+ {
1703
+ "epoch": 1.91,
1704
+ "grad_norm": 0.31405743956565857,
1705
+ "learning_rate": 5.2259825327510926e-05,
1706
+ "loss": 0.1061,
1707
+ "step": 4375
1708
+ },
1709
+ {
1710
+ "epoch": 1.92,
1711
+ "grad_norm": 6.475078105926514,
1712
+ "learning_rate": 5.198689956331878e-05,
1713
+ "loss": 0.1381,
1714
+ "step": 4400
1715
+ },
1716
+ {
1717
+ "epoch": 1.92,
1718
+ "eval_loss": 0.2253342866897583,
1719
+ "eval_na_accuracy": 0.7857142686843872,
1720
+ "eval_ordinal_accuracy": 0.6792834401130676,
1721
+ "eval_ordinal_mae": 0.3813394010066986,
1722
+ "eval_runtime": 156.7221,
1723
+ "eval_samples_per_second": 25.389,
1724
+ "eval_steps_per_second": 3.178,
1725
+ "step": 4400
1726
+ },
1727
+ {
1728
+ "epoch": 1.93,
1729
+ "grad_norm": 0.5825577974319458,
1730
+ "learning_rate": 5.171397379912663e-05,
1731
+ "loss": 0.1524,
1732
+ "step": 4425
1733
+ },
1734
+ {
1735
+ "epoch": 1.94,
1736
+ "grad_norm": 1.4979143142700195,
1737
+ "learning_rate": 5.14410480349345e-05,
1738
+ "loss": 0.142,
1739
+ "step": 4450
1740
+ },
1741
+ {
1742
+ "epoch": 1.95,
1743
+ "grad_norm": 0.5638359785079956,
1744
+ "learning_rate": 5.116812227074236e-05,
1745
+ "loss": 0.1225,
1746
+ "step": 4475
1747
+ },
1748
+ {
1749
+ "epoch": 1.97,
1750
+ "grad_norm": 0.44809991121292114,
1751
+ "learning_rate": 5.089519650655022e-05,
1752
+ "loss": 0.1282,
1753
+ "step": 4500
1754
+ },
1755
+ {
1756
+ "epoch": 1.97,
1757
+ "eval_loss": 0.23353050649166107,
1758
+ "eval_na_accuracy": 0.8436293601989746,
1759
+ "eval_ordinal_accuracy": 0.6509679555892944,
1760
+ "eval_ordinal_mae": 0.41455620527267456,
1761
+ "eval_runtime": 154.2508,
1762
+ "eval_samples_per_second": 25.796,
1763
+ "eval_steps_per_second": 3.229,
1764
+ "step": 4500
1765
+ },
1766
+ {
1767
+ "epoch": 1.98,
1768
+ "grad_norm": 2.9640109539031982,
1769
+ "learning_rate": 5.062227074235808e-05,
1770
+ "loss": 0.1057,
1771
+ "step": 4525
1772
+ },
1773
+ {
1774
+ "epoch": 1.99,
1775
+ "grad_norm": 0.44117558002471924,
1776
+ "learning_rate": 5.034934497816595e-05,
1777
+ "loss": 0.13,
1778
+ "step": 4550
1779
+ },
1780
+ {
1781
+ "epoch": 2.0,
1782
+ "grad_norm": 0.6167708039283752,
1783
+ "learning_rate": 5.00764192139738e-05,
1784
+ "loss": 0.1223,
1785
+ "step": 4575
1786
+ },
1787
+ {
1788
+ "epoch": 2.01,
1789
+ "grad_norm": 2.0781619548797607,
1790
+ "learning_rate": 4.9803493449781664e-05,
1791
+ "loss": 0.0734,
1792
+ "step": 4600
1793
+ },
1794
+ {
1795
+ "epoch": 2.01,
1796
+ "eval_loss": 0.23820282518863678,
1797
+ "eval_na_accuracy": 0.7895752787590027,
1798
+ "eval_ordinal_accuracy": 0.689685046672821,
1799
+ "eval_ordinal_mae": 0.38021621108055115,
1800
+ "eval_runtime": 154.7894,
1801
+ "eval_samples_per_second": 25.706,
1802
+ "eval_steps_per_second": 3.217,
1803
+ "step": 4600
1804
+ },
1805
+ {
1806
+ "epoch": 2.02,
1807
+ "grad_norm": 3.165067195892334,
1808
+ "learning_rate": 4.9530567685589524e-05,
1809
+ "loss": 0.08,
1810
+ "step": 4625
1811
+ },
1812
+ {
1813
+ "epoch": 2.03,
1814
+ "grad_norm": 0.43800944089889526,
1815
+ "learning_rate": 4.9257641921397385e-05,
1816
+ "loss": 0.0571,
1817
+ "step": 4650
1818
+ },
1819
+ {
1820
+ "epoch": 2.04,
1821
+ "grad_norm": 0.45996198058128357,
1822
+ "learning_rate": 4.898471615720524e-05,
1823
+ "loss": 0.0579,
1824
+ "step": 4675
1825
+ },
1826
+ {
1827
+ "epoch": 2.05,
1828
+ "grad_norm": 1.9151467084884644,
1829
+ "learning_rate": 4.87117903930131e-05,
1830
+ "loss": 0.1046,
1831
+ "step": 4700
1832
+ },
1833
+ {
1834
+ "epoch": 2.05,
1835
+ "eval_loss": 0.2358408272266388,
1836
+ "eval_na_accuracy": 0.8011583089828491,
1837
+ "eval_ordinal_accuracy": 0.6873735785484314,
1838
+ "eval_ordinal_mae": 0.36946654319763184,
1839
+ "eval_runtime": 150.9459,
1840
+ "eval_samples_per_second": 26.36,
1841
+ "eval_steps_per_second": 3.299,
1842
+ "step": 4700
1843
+ },
1844
+ {
1845
+ "epoch": 2.06,
1846
+ "grad_norm": 0.5566712021827698,
1847
+ "learning_rate": 4.843886462882096e-05,
1848
+ "loss": 0.068,
1849
+ "step": 4725
1850
+ },
1851
+ {
1852
+ "epoch": 2.07,
1853
+ "grad_norm": 0.5846825838088989,
1854
+ "learning_rate": 4.8165938864628827e-05,
1855
+ "loss": 0.0487,
1856
+ "step": 4750
1857
+ },
1858
+ {
1859
+ "epoch": 2.09,
1860
+ "grad_norm": 0.3993060290813446,
1861
+ "learning_rate": 4.789301310043669e-05,
1862
+ "loss": 0.0546,
1863
+ "step": 4775
1864
+ },
1865
+ {
1866
+ "epoch": 2.1,
1867
+ "grad_norm": 0.6624791026115417,
1868
+ "learning_rate": 4.762008733624455e-05,
1869
+ "loss": 0.0529,
1870
+ "step": 4800
1871
+ },
1872
+ {
1873
+ "epoch": 2.1,
1874
+ "eval_loss": 0.246298685669899,
1875
+ "eval_na_accuracy": 0.7934362888336182,
1876
+ "eval_ordinal_accuracy": 0.7096214890480042,
1877
+ "eval_ordinal_mae": 0.3595849573612213,
1878
+ "eval_runtime": 153.0771,
1879
+ "eval_samples_per_second": 25.993,
1880
+ "eval_steps_per_second": 3.253,
1881
+ "step": 4800
1882
+ },
1883
+ {
1884
+ "epoch": 2.11,
1885
+ "grad_norm": 0.33810412883758545,
1886
+ "learning_rate": 4.734716157205241e-05,
1887
+ "loss": 0.0625,
1888
+ "step": 4825
1889
+ },
1890
+ {
1891
+ "epoch": 2.12,
1892
+ "grad_norm": 3.0376217365264893,
1893
+ "learning_rate": 4.707423580786026e-05,
1894
+ "loss": 0.0749,
1895
+ "step": 4850
1896
+ },
1897
+ {
1898
+ "epoch": 2.13,
1899
+ "grad_norm": 0.4552454352378845,
1900
+ "learning_rate": 4.680131004366812e-05,
1901
+ "loss": 0.0536,
1902
+ "step": 4875
1903
+ },
1904
+ {
1905
+ "epoch": 2.14,
1906
+ "grad_norm": 0.6775699257850647,
1907
+ "learning_rate": 4.652838427947598e-05,
1908
+ "loss": 0.0687,
1909
+ "step": 4900
1910
+ },
1911
+ {
1912
+ "epoch": 2.14,
1913
+ "eval_loss": 0.26146525144577026,
1914
+ "eval_na_accuracy": 0.7857142686843872,
1915
+ "eval_ordinal_accuracy": 0.6737936735153198,
1916
+ "eval_ordinal_mae": 0.39211294054985046,
1917
+ "eval_runtime": 156.1814,
1918
+ "eval_samples_per_second": 25.477,
1919
+ "eval_steps_per_second": 3.189,
1920
+ "step": 4900
1921
+ },
1922
+ {
1923
+ "epoch": 2.15,
1924
+ "grad_norm": 0.623523473739624,
1925
+ "learning_rate": 4.625545851528384e-05,
1926
+ "loss": 0.068,
1927
+ "step": 4925
1928
+ },
1929
+ {
1930
+ "epoch": 2.16,
1931
+ "grad_norm": 0.5668771862983704,
1932
+ "learning_rate": 4.5982532751091704e-05,
1933
+ "loss": 0.0684,
1934
+ "step": 4950
1935
+ },
1936
+ {
1937
+ "epoch": 2.17,
1938
+ "grad_norm": 0.16505593061447144,
1939
+ "learning_rate": 4.5709606986899564e-05,
1940
+ "loss": 0.0546,
1941
+ "step": 4975
1942
+ },
1943
+ {
1944
+ "epoch": 2.18,
1945
+ "grad_norm": 0.5560925602912903,
1946
+ "learning_rate": 4.5436681222707425e-05,
1947
+ "loss": 0.0613,
1948
+ "step": 5000
1949
+ },
1950
+ {
1951
+ "epoch": 2.18,
1952
+ "eval_loss": 0.2542937397956848,
1953
+ "eval_na_accuracy": 0.8108108043670654,
1954
+ "eval_ordinal_accuracy": 0.6876625418663025,
1955
+ "eval_ordinal_mae": 0.36514100432395935,
1956
+ "eval_runtime": 156.8699,
1957
+ "eval_samples_per_second": 25.365,
1958
+ "eval_steps_per_second": 3.175,
1959
+ "step": 5000
1960
+ },
1961
+ {
1962
+ "epoch": 2.19,
1963
+ "grad_norm": 0.440731942653656,
1964
+ "learning_rate": 4.5163755458515285e-05,
1965
+ "loss": 0.0967,
1966
+ "step": 5025
1967
+ },
1968
+ {
1969
+ "epoch": 2.21,
1970
+ "grad_norm": 0.26702314615249634,
1971
+ "learning_rate": 4.4890829694323146e-05,
1972
+ "loss": 0.0499,
1973
+ "step": 5050
1974
+ },
1975
+ {
1976
+ "epoch": 2.22,
1977
+ "grad_norm": 0.9617776274681091,
1978
+ "learning_rate": 4.4617903930131006e-05,
1979
+ "loss": 0.0792,
1980
+ "step": 5075
1981
+ },
1982
+ {
1983
+ "epoch": 2.23,
1984
+ "grad_norm": 3.431128740310669,
1985
+ "learning_rate": 4.434497816593887e-05,
1986
+ "loss": 0.0591,
1987
+ "step": 5100
1988
+ },
1989
+ {
1990
+ "epoch": 2.23,
1991
+ "eval_loss": 0.25389814376831055,
1992
+ "eval_na_accuracy": 0.7915058135986328,
1993
+ "eval_ordinal_accuracy": 0.6885293126106262,
1994
+ "eval_ordinal_mae": 0.3693314790725708,
1995
+ "eval_runtime": 155.0651,
1996
+ "eval_samples_per_second": 25.66,
1997
+ "eval_steps_per_second": 3.212,
1998
+ "step": 5100
1999
+ },
2000
+ {
2001
+ "epoch": 2.24,
2002
+ "grad_norm": 5.355819225311279,
2003
+ "learning_rate": 4.407205240174673e-05,
2004
+ "loss": 0.0611,
2005
+ "step": 5125
2006
+ },
2007
+ {
2008
+ "epoch": 2.25,
2009
+ "grad_norm": 0.4490479528903961,
2010
+ "learning_rate": 4.379912663755459e-05,
2011
+ "loss": 0.0676,
2012
+ "step": 5150
2013
+ },
2014
+ {
2015
+ "epoch": 2.26,
2016
+ "grad_norm": 0.594838559627533,
2017
+ "learning_rate": 4.352620087336245e-05,
2018
+ "loss": 0.0567,
2019
+ "step": 5175
2020
+ },
2021
+ {
2022
+ "epoch": 2.27,
2023
+ "grad_norm": 0.2912037670612335,
2024
+ "learning_rate": 4.325327510917031e-05,
2025
+ "loss": 0.0474,
2026
+ "step": 5200
2027
+ },
2028
+ {
2029
+ "epoch": 2.27,
2030
+ "eval_loss": 0.26495158672332764,
2031
+ "eval_na_accuracy": 0.799227774143219,
2032
+ "eval_ordinal_accuracy": 0.6836174726486206,
2033
+ "eval_ordinal_mae": 0.3721810579299927,
2034
+ "eval_runtime": 156.3533,
2035
+ "eval_samples_per_second": 25.449,
2036
+ "eval_steps_per_second": 3.185,
2037
+ "step": 5200
2038
+ },
2039
+ {
2040
+ "epoch": 2.28,
2041
+ "grad_norm": 1.547253966331482,
2042
+ "learning_rate": 4.298034934497817e-05,
2043
+ "loss": 0.0583,
2044
+ "step": 5225
2045
+ },
2046
+ {
2047
+ "epoch": 2.29,
2048
+ "grad_norm": 0.6619101762771606,
2049
+ "learning_rate": 4.270742358078603e-05,
2050
+ "loss": 0.0713,
2051
+ "step": 5250
2052
+ },
2053
+ {
2054
+ "epoch": 2.3,
2055
+ "grad_norm": 0.6830999255180359,
2056
+ "learning_rate": 4.243449781659389e-05,
2057
+ "loss": 0.068,
2058
+ "step": 5275
2059
+ },
2060
+ {
2061
+ "epoch": 2.31,
2062
+ "grad_norm": 0.6977065801620483,
2063
+ "learning_rate": 4.216157205240175e-05,
2064
+ "loss": 0.0511,
2065
+ "step": 5300
2066
+ },
2067
+ {
2068
+ "epoch": 2.31,
2069
+ "eval_loss": 0.2630845010280609,
2070
+ "eval_na_accuracy": 0.8127413392066956,
2071
+ "eval_ordinal_accuracy": 0.686795711517334,
2072
+ "eval_ordinal_mae": 0.36813271045684814,
2073
+ "eval_runtime": 154.8738,
2074
+ "eval_samples_per_second": 25.692,
2075
+ "eval_steps_per_second": 3.216,
2076
+ "step": 5300
2077
+ },
2078
+ {
2079
+ "epoch": 2.33,
2080
+ "grad_norm": 0.8015338778495789,
2081
+ "learning_rate": 4.188864628820961e-05,
2082
+ "loss": 0.0511,
2083
+ "step": 5325
2084
+ },
2085
+ {
2086
+ "epoch": 2.34,
2087
+ "grad_norm": 2.4707908630371094,
2088
+ "learning_rate": 4.161572052401747e-05,
2089
+ "loss": 0.0704,
2090
+ "step": 5350
2091
+ },
2092
+ {
2093
+ "epoch": 2.35,
2094
+ "grad_norm": 0.526327908039093,
2095
+ "learning_rate": 4.134279475982533e-05,
2096
+ "loss": 0.0824,
2097
+ "step": 5375
2098
+ },
2099
+ {
2100
+ "epoch": 2.36,
2101
+ "grad_norm": 1.6800874471664429,
2102
+ "learning_rate": 4.1069868995633186e-05,
2103
+ "loss": 0.0683,
2104
+ "step": 5400
2105
+ },
2106
+ {
2107
+ "epoch": 2.36,
2108
+ "eval_loss": 0.271382212638855,
2109
+ "eval_na_accuracy": 0.7837837934494019,
2110
+ "eval_ordinal_accuracy": 0.6954637169837952,
2111
+ "eval_ordinal_mae": 0.36302649974823,
2112
+ "eval_runtime": 155.8769,
2113
+ "eval_samples_per_second": 25.527,
2114
+ "eval_steps_per_second": 3.195,
2115
+ "step": 5400
2116
+ },
2117
+ {
2118
+ "epoch": 2.37,
2119
+ "grad_norm": 0.5335835814476013,
2120
+ "learning_rate": 4.0796943231441046e-05,
2121
+ "loss": 0.056,
2122
+ "step": 5425
2123
+ },
2124
+ {
2125
+ "epoch": 2.38,
2126
+ "grad_norm": 2.2423832416534424,
2127
+ "learning_rate": 4.052401746724891e-05,
2128
+ "loss": 0.0787,
2129
+ "step": 5450
2130
+ },
2131
+ {
2132
+ "epoch": 2.39,
2133
+ "grad_norm": 0.558754026889801,
2134
+ "learning_rate": 4.025109170305677e-05,
2135
+ "loss": 0.0481,
2136
+ "step": 5475
2137
+ },
2138
+ {
2139
+ "epoch": 2.4,
2140
+ "grad_norm": 0.6908044815063477,
2141
+ "learning_rate": 3.9978165938864635e-05,
2142
+ "loss": 0.0654,
2143
+ "step": 5500
2144
+ },
2145
+ {
2146
+ "epoch": 2.4,
2147
+ "eval_loss": 0.27688542008399963,
2148
+ "eval_na_accuracy": 0.799227774143219,
2149
+ "eval_ordinal_accuracy": 0.6787055730819702,
2150
+ "eval_ordinal_mae": 0.3673117458820343,
2151
+ "eval_runtime": 154.1587,
2152
+ "eval_samples_per_second": 25.811,
2153
+ "eval_steps_per_second": 3.23,
2154
+ "step": 5500
2155
+ },
2156
+ {
2157
+ "epoch": 2.41,
2158
+ "grad_norm": 0.43695610761642456,
2159
+ "learning_rate": 3.9705240174672495e-05,
2160
+ "loss": 0.0539,
2161
+ "step": 5525
2162
+ },
2163
+ {
2164
+ "epoch": 2.42,
2165
+ "grad_norm": 3.290745496749878,
2166
+ "learning_rate": 3.9432314410480356e-05,
2167
+ "loss": 0.0608,
2168
+ "step": 5550
2169
+ },
2170
+ {
2171
+ "epoch": 2.43,
2172
+ "grad_norm": 0.25710350275039673,
2173
+ "learning_rate": 3.9159388646288216e-05,
2174
+ "loss": 0.0657,
2175
+ "step": 5575
2176
+ },
2177
+ {
2178
+ "epoch": 2.45,
2179
+ "grad_norm": 0.6269412040710449,
2180
+ "learning_rate": 3.888646288209607e-05,
2181
+ "loss": 0.0581,
2182
+ "step": 5600
2183
+ },
2184
+ {
2185
+ "epoch": 2.45,
2186
+ "eval_loss": 0.27770209312438965,
2187
+ "eval_na_accuracy": 0.799227774143219,
2188
+ "eval_ordinal_accuracy": 0.6951748132705688,
2189
+ "eval_ordinal_mae": 0.3627748191356659,
2190
+ "eval_runtime": 157.9718,
2191
+ "eval_samples_per_second": 25.188,
2192
+ "eval_steps_per_second": 3.152,
2193
+ "step": 5600
2194
+ },
2195
+ {
2196
+ "epoch": 2.46,
2197
+ "grad_norm": 1.5579248666763306,
2198
+ "learning_rate": 3.861353711790393e-05,
2199
+ "loss": 0.0556,
2200
+ "step": 5625
2201
+ },
2202
+ {
2203
+ "epoch": 2.47,
2204
+ "grad_norm": 0.30162498354911804,
2205
+ "learning_rate": 3.834061135371179e-05,
2206
+ "loss": 0.0845,
2207
+ "step": 5650
2208
+ },
2209
+ {
2210
+ "epoch": 2.48,
2211
+ "grad_norm": 0.43656259775161743,
2212
+ "learning_rate": 3.806768558951965e-05,
2213
+ "loss": 0.0616,
2214
+ "step": 5675
2215
+ },
2216
+ {
2217
+ "epoch": 2.49,
2218
+ "grad_norm": 0.4505567252635956,
2219
+ "learning_rate": 3.779475982532751e-05,
2220
+ "loss": 0.072,
2221
+ "step": 5700
2222
+ },
2223
+ {
2224
+ "epoch": 2.49,
2225
+ "eval_loss": 0.29192212224006653,
2226
+ "eval_na_accuracy": 0.7683397531509399,
2227
+ "eval_ordinal_accuracy": 0.6888182759284973,
2228
+ "eval_ordinal_mae": 0.36100971698760986,
2229
+ "eval_runtime": 155.17,
2230
+ "eval_samples_per_second": 25.643,
2231
+ "eval_steps_per_second": 3.209,
2232
+ "step": 5700
2233
+ },
2234
+ {
2235
+ "epoch": 2.5,
2236
+ "grad_norm": 0.8305689692497253,
2237
+ "learning_rate": 3.752183406113537e-05,
2238
+ "loss": 0.0489,
2239
+ "step": 5725
2240
+ },
2241
+ {
2242
+ "epoch": 2.51,
2243
+ "grad_norm": 0.5452491641044617,
2244
+ "learning_rate": 3.724890829694323e-05,
2245
+ "loss": 0.0584,
2246
+ "step": 5750
2247
+ },
2248
+ {
2249
+ "epoch": 2.52,
2250
+ "grad_norm": 0.7193215489387512,
2251
+ "learning_rate": 3.697598253275109e-05,
2252
+ "loss": 0.0643,
2253
+ "step": 5775
2254
+ },
2255
+ {
2256
+ "epoch": 2.53,
2257
+ "grad_norm": 0.4529215693473816,
2258
+ "learning_rate": 3.6703056768558954e-05,
2259
+ "loss": 0.0737,
2260
+ "step": 5800
2261
+ },
2262
+ {
2263
+ "epoch": 2.53,
2264
+ "eval_loss": 0.2807420790195465,
2265
+ "eval_na_accuracy": 0.7837837934494019,
2266
+ "eval_ordinal_accuracy": 0.6983530521392822,
2267
+ "eval_ordinal_mae": 0.3611612617969513,
2268
+ "eval_runtime": 154.656,
2269
+ "eval_samples_per_second": 25.728,
2270
+ "eval_steps_per_second": 3.22,
2271
+ "step": 5800
2272
+ },
2273
+ {
2274
+ "epoch": 2.54,
2275
+ "grad_norm": 0.4110221564769745,
2276
+ "learning_rate": 3.6430131004366814e-05,
2277
+ "loss": 0.0548,
2278
+ "step": 5825
2279
+ },
2280
+ {
2281
+ "epoch": 2.55,
2282
+ "grad_norm": 0.7328541278839111,
2283
+ "learning_rate": 3.6157205240174675e-05,
2284
+ "loss": 0.0728,
2285
+ "step": 5850
2286
+ },
2287
+ {
2288
+ "epoch": 2.57,
2289
+ "grad_norm": 0.2497873455286026,
2290
+ "learning_rate": 3.5884279475982535e-05,
2291
+ "loss": 0.0608,
2292
+ "step": 5875
2293
+ },
2294
+ {
2295
+ "epoch": 2.58,
2296
+ "grad_norm": 6.07706880569458,
2297
+ "learning_rate": 3.5611353711790396e-05,
2298
+ "loss": 0.0667,
2299
+ "step": 5900
2300
+ },
2301
+ {
2302
+ "epoch": 2.58,
2303
+ "eval_loss": 0.292630672454834,
2304
+ "eval_na_accuracy": 0.7509652376174927,
2305
+ "eval_ordinal_accuracy": 0.7000866532325745,
2306
+ "eval_ordinal_mae": 0.36070069670677185,
2307
+ "eval_runtime": 156.6494,
2308
+ "eval_samples_per_second": 25.401,
2309
+ "eval_steps_per_second": 3.179,
2310
+ "step": 5900
2311
+ },
2312
+ {
2313
+ "epoch": 2.59,
2314
+ "grad_norm": 2.6040148735046387,
2315
+ "learning_rate": 3.5338427947598256e-05,
2316
+ "loss": 0.0745,
2317
+ "step": 5925
2318
+ },
2319
+ {
2320
+ "epoch": 2.6,
2321
+ "grad_norm": 0.428023099899292,
2322
+ "learning_rate": 3.506550218340611e-05,
2323
+ "loss": 0.0455,
2324
+ "step": 5950
2325
+ },
2326
+ {
2327
+ "epoch": 2.61,
2328
+ "grad_norm": 0.24350808560848236,
2329
+ "learning_rate": 3.479257641921397e-05,
2330
+ "loss": 0.067,
2331
+ "step": 5975
2332
+ },
2333
+ {
2334
+ "epoch": 2.62,
2335
+ "grad_norm": 0.6879482865333557,
2336
+ "learning_rate": 3.451965065502184e-05,
2337
+ "loss": 0.0669,
2338
+ "step": 6000
2339
+ },
2340
+ {
2341
+ "epoch": 2.62,
2342
+ "eval_loss": 0.2874707579612732,
2343
+ "eval_na_accuracy": 0.799227774143219,
2344
+ "eval_ordinal_accuracy": 0.6891071796417236,
2345
+ "eval_ordinal_mae": 0.36164331436157227,
2346
+ "eval_runtime": 155.7575,
2347
+ "eval_samples_per_second": 25.546,
2348
+ "eval_steps_per_second": 3.197,
2349
+ "step": 6000
2350
+ },
2351
+ {
2352
+ "epoch": 2.63,
2353
+ "grad_norm": 0.786054790019989,
2354
+ "learning_rate": 3.42467248908297e-05,
2355
+ "loss": 0.0736,
2356
+ "step": 6025
2357
+ },
2358
+ {
2359
+ "epoch": 2.64,
2360
+ "grad_norm": 0.7047861218452454,
2361
+ "learning_rate": 3.397379912663756e-05,
2362
+ "loss": 0.0551,
2363
+ "step": 6050
2364
+ },
2365
+ {
2366
+ "epoch": 2.65,
2367
+ "grad_norm": 0.6863640546798706,
2368
+ "learning_rate": 3.370087336244542e-05,
2369
+ "loss": 0.0643,
2370
+ "step": 6075
2371
+ },
2372
+ {
2373
+ "epoch": 2.66,
2374
+ "grad_norm": 0.6037794947624207,
2375
+ "learning_rate": 3.342794759825328e-05,
2376
+ "loss": 0.0535,
2377
+ "step": 6100
2378
+ },
2379
+ {
2380
+ "epoch": 2.66,
2381
+ "eval_loss": 0.2853965759277344,
2382
+ "eval_na_accuracy": 0.7683397531509399,
2383
+ "eval_ordinal_accuracy": 0.6960415840148926,
2384
+ "eval_ordinal_mae": 0.35648074746131897,
2385
+ "eval_runtime": 156.7027,
2386
+ "eval_samples_per_second": 25.392,
2387
+ "eval_steps_per_second": 3.178,
2388
+ "step": 6100
2389
+ },
2390
+ {
2391
+ "epoch": 2.67,
2392
+ "grad_norm": 0.30565348267555237,
2393
+ "learning_rate": 3.315502183406114e-05,
2394
+ "loss": 0.0412,
2395
+ "step": 6125
2396
+ },
2397
+ {
2398
+ "epoch": 2.69,
2399
+ "grad_norm": 0.3631564974784851,
2400
+ "learning_rate": 3.2882096069868994e-05,
2401
+ "loss": 0.0584,
2402
+ "step": 6150
2403
+ },
2404
+ {
2405
+ "epoch": 2.7,
2406
+ "grad_norm": 0.6103675365447998,
2407
+ "learning_rate": 3.2609170305676854e-05,
2408
+ "loss": 0.0481,
2409
+ "step": 6175
2410
+ },
2411
+ {
2412
+ "epoch": 2.71,
2413
+ "grad_norm": 10.306925773620605,
2414
+ "learning_rate": 3.2336244541484715e-05,
2415
+ "loss": 0.06,
2416
+ "step": 6200
2417
+ },
2418
+ {
2419
+ "epoch": 2.71,
2420
+ "eval_loss": 0.28473392128944397,
2421
+ "eval_na_accuracy": 0.7741312980651855,
2422
+ "eval_ordinal_accuracy": 0.7015313506126404,
2423
+ "eval_ordinal_mae": 0.3500910997390747,
2424
+ "eval_runtime": 154.8993,
2425
+ "eval_samples_per_second": 25.688,
2426
+ "eval_steps_per_second": 3.215,
2427
+ "step": 6200
2428
+ },
2429
+ {
2430
+ "epoch": 2.72,
2431
+ "grad_norm": 0.5145313143730164,
2432
+ "learning_rate": 3.2063318777292575e-05,
2433
+ "loss": 0.0586,
2434
+ "step": 6225
2435
+ },
2436
+ {
2437
+ "epoch": 2.73,
2438
+ "grad_norm": 0.9967983365058899,
2439
+ "learning_rate": 3.1790393013100436e-05,
2440
+ "loss": 0.0736,
2441
+ "step": 6250
2442
+ },
2443
+ {
2444
+ "epoch": 2.74,
2445
+ "grad_norm": 0.652100682258606,
2446
+ "learning_rate": 3.15174672489083e-05,
2447
+ "loss": 0.057,
2448
+ "step": 6275
2449
+ },
2450
+ {
2451
+ "epoch": 2.75,
2452
+ "grad_norm": 0.64205402135849,
2453
+ "learning_rate": 3.1244541484716164e-05,
2454
+ "loss": 0.0534,
2455
+ "step": 6300
2456
+ },
2457
+ {
2458
+ "epoch": 2.75,
2459
+ "eval_loss": 0.28205522894859314,
2460
+ "eval_na_accuracy": 0.7625482678413391,
2461
+ "eval_ordinal_accuracy": 0.7006645202636719,
2462
+ "eval_ordinal_mae": 0.34947505593299866,
2463
+ "eval_runtime": 155.841,
2464
+ "eval_samples_per_second": 25.532,
2465
+ "eval_steps_per_second": 3.196,
2466
+ "step": 6300
2467
+ },
2468
+ {
2469
+ "epoch": 2.76,
2470
+ "grad_norm": 1.2284494638442993,
2471
+ "learning_rate": 3.097161572052402e-05,
2472
+ "loss": 0.0517,
2473
+ "step": 6325
2474
+ },
2475
+ {
2476
+ "epoch": 2.77,
2477
+ "grad_norm": 0.4024920165538788,
2478
+ "learning_rate": 3.069868995633188e-05,
2479
+ "loss": 0.0857,
2480
+ "step": 6350
2481
+ },
2482
+ {
2483
+ "epoch": 2.78,
2484
+ "grad_norm": 0.5288224816322327,
2485
+ "learning_rate": 3.0425764192139738e-05,
2486
+ "loss": 0.051,
2487
+ "step": 6375
2488
+ },
2489
+ {
2490
+ "epoch": 2.79,
2491
+ "grad_norm": 0.6519297957420349,
2492
+ "learning_rate": 3.01528384279476e-05,
2493
+ "loss": 0.0526,
2494
+ "step": 6400
2495
+ },
2496
+ {
2497
+ "epoch": 2.79,
2498
+ "eval_loss": 0.28344523906707764,
2499
+ "eval_na_accuracy": 0.7625482678413391,
2500
+ "eval_ordinal_accuracy": 0.670037567615509,
2501
+ "eval_ordinal_mae": 0.385305792093277,
2502
+ "eval_runtime": 154.7606,
2503
+ "eval_samples_per_second": 25.711,
2504
+ "eval_steps_per_second": 3.218,
2505
+ "step": 6400
2506
+ },
2507
+ {
2508
+ "epoch": 2.81,
2509
+ "grad_norm": 0.7009943723678589,
2510
+ "learning_rate": 2.987991266375546e-05,
2511
+ "loss": 0.0624,
2512
+ "step": 6425
2513
+ },
2514
+ {
2515
+ "epoch": 2.82,
2516
+ "grad_norm": 0.6731958389282227,
2517
+ "learning_rate": 2.9606986899563323e-05,
2518
+ "loss": 0.068,
2519
+ "step": 6450
2520
+ },
2521
+ {
2522
+ "epoch": 2.83,
2523
+ "grad_norm": 0.3764704763889313,
2524
+ "learning_rate": 2.9334061135371184e-05,
2525
+ "loss": 0.0677,
2526
+ "step": 6475
2527
+ },
2528
+ {
2529
+ "epoch": 2.84,
2530
+ "grad_norm": 0.5906457901000977,
2531
+ "learning_rate": 2.9061135371179037e-05,
2532
+ "loss": 0.0841,
2533
+ "step": 6500
2534
+ },
2535
+ {
2536
+ "epoch": 2.84,
2537
+ "eval_loss": 0.28385940194129944,
2538
+ "eval_na_accuracy": 0.7490347623825073,
2539
+ "eval_ordinal_accuracy": 0.7044206857681274,
2540
+ "eval_ordinal_mae": 0.35039493441581726,
2541
+ "eval_runtime": 156.3169,
2542
+ "eval_samples_per_second": 25.455,
2543
+ "eval_steps_per_second": 3.186,
2544
+ "step": 6500
2545
+ },
2546
+ {
2547
+ "epoch": 2.85,
2548
+ "grad_norm": 0.3305709958076477,
2549
+ "learning_rate": 2.8788209606986898e-05,
2550
+ "loss": 0.0652,
2551
+ "step": 6525
2552
+ },
2553
+ {
2554
+ "epoch": 2.86,
2555
+ "grad_norm": 0.4257774353027344,
2556
+ "learning_rate": 2.8526200873362446e-05,
2557
+ "loss": 0.0608,
2558
+ "step": 6550
2559
+ },
2560
+ {
2561
+ "epoch": 2.87,
2562
+ "grad_norm": 0.9447629451751709,
2563
+ "learning_rate": 2.8253275109170307e-05,
2564
+ "loss": 0.0527,
2565
+ "step": 6575
2566
+ },
2567
+ {
2568
+ "epoch": 2.88,
2569
+ "grad_norm": 0.8275176882743835,
2570
+ "learning_rate": 2.7980349344978167e-05,
2571
+ "loss": 0.0529,
2572
+ "step": 6600
2573
+ },
2574
+ {
2575
+ "epoch": 2.88,
2576
+ "eval_loss": 0.2857762575149536,
2577
+ "eval_na_accuracy": 0.7818532586097717,
2578
+ "eval_ordinal_accuracy": 0.689685046672821,
2579
+ "eval_ordinal_mae": 0.35951367020606995,
2580
+ "eval_runtime": 156.4554,
2581
+ "eval_samples_per_second": 25.432,
2582
+ "eval_steps_per_second": 3.183,
2583
+ "step": 6600
2584
+ },
2585
+ {
2586
+ "epoch": 2.89,
2587
+ "grad_norm": 0.5894487500190735,
2588
+ "learning_rate": 2.770742358078603e-05,
2589
+ "loss": 0.0643,
2590
+ "step": 6625
2591
+ },
2592
+ {
2593
+ "epoch": 2.9,
2594
+ "grad_norm": 0.8872036933898926,
2595
+ "learning_rate": 2.743449781659389e-05,
2596
+ "loss": 0.0578,
2597
+ "step": 6650
2598
+ },
2599
+ {
2600
+ "epoch": 2.91,
2601
+ "grad_norm": 0.511169970035553,
2602
+ "learning_rate": 2.7161572052401745e-05,
2603
+ "loss": 0.0393,
2604
+ "step": 6675
2605
+ },
2606
+ {
2607
+ "epoch": 2.93,
2608
+ "grad_norm": 0.5537564754486084,
2609
+ "learning_rate": 2.6888646288209606e-05,
2610
+ "loss": 0.0811,
2611
+ "step": 6700
2612
+ },
2613
+ {
2614
+ "epoch": 2.93,
2615
+ "eval_loss": 0.28430891036987305,
2616
+ "eval_na_accuracy": 0.7799227833747864,
2617
+ "eval_ordinal_accuracy": 0.7047096490859985,
2618
+ "eval_ordinal_mae": 0.34795093536376953,
2619
+ "eval_runtime": 156.7138,
2620
+ "eval_samples_per_second": 25.39,
2621
+ "eval_steps_per_second": 3.178,
2622
+ "step": 6700
2623
+ },
2624
+ {
2625
+ "epoch": 2.94,
2626
+ "grad_norm": 1.6618189811706543,
2627
+ "learning_rate": 2.661572052401747e-05,
2628
+ "loss": 0.0705,
2629
+ "step": 6725
2630
+ },
2631
+ {
2632
+ "epoch": 2.95,
2633
+ "grad_norm": 0.4913439154624939,
2634
+ "learning_rate": 2.634279475982533e-05,
2635
+ "loss": 0.0875,
2636
+ "step": 6750
2637
+ },
2638
+ {
2639
+ "epoch": 2.96,
2640
+ "grad_norm": 0.5068143010139465,
2641
+ "learning_rate": 2.606986899563319e-05,
2642
+ "loss": 0.0557,
2643
+ "step": 6775
2644
+ },
2645
+ {
2646
+ "epoch": 2.97,
2647
+ "grad_norm": 0.23857353627681732,
2648
+ "learning_rate": 2.579694323144105e-05,
2649
+ "loss": 0.0502,
2650
+ "step": 6800
2651
+ },
2652
+ {
2653
+ "epoch": 2.97,
2654
+ "eval_loss": 0.28915783762931824,
2655
+ "eval_na_accuracy": 0.7818532586097717,
2656
+ "eval_ordinal_accuracy": 0.700953483581543,
2657
+ "eval_ordinal_mae": 0.34830236434936523,
2658
+ "eval_runtime": 155.6303,
2659
+ "eval_samples_per_second": 25.567,
2660
+ "eval_steps_per_second": 3.2,
2661
+ "step": 6800
2662
+ },
2663
+ {
2664
+ "epoch": 2.98,
2665
+ "grad_norm": 0.9183345437049866,
2666
+ "learning_rate": 2.552401746724891e-05,
2667
+ "loss": 0.0514,
2668
+ "step": 6825
2669
+ },
2670
+ {
2671
+ "epoch": 2.99,
2672
+ "grad_norm": 0.6839514374732971,
2673
+ "learning_rate": 2.525109170305677e-05,
2674
+ "loss": 0.0682,
2675
+ "step": 6850
2676
+ },
2677
+ {
2678
+ "epoch": 3.0,
2679
+ "grad_norm": 2.4068310260772705,
2680
+ "learning_rate": 2.4978165938864632e-05,
2681
+ "loss": 0.0709,
2682
+ "step": 6875
2683
+ },
2684
+ {
2685
+ "epoch": 3.01,
2686
+ "grad_norm": 0.04599028080701828,
2687
+ "learning_rate": 2.470524017467249e-05,
2688
+ "loss": 0.0273,
2689
+ "step": 6900
2690
+ },
2691
+ {
2692
+ "epoch": 3.01,
2693
+ "eval_loss": 0.2801385819911957,
2694
+ "eval_na_accuracy": 0.8108108043670654,
2695
+ "eval_ordinal_accuracy": 0.6957526803016663,
2696
+ "eval_ordinal_mae": 0.34536227583885193,
2697
+ "eval_runtime": 156.2048,
2698
+ "eval_samples_per_second": 25.473,
2699
+ "eval_steps_per_second": 3.188,
2700
+ "step": 6900
2701
+ },
2702
+ {
2703
+ "epoch": 3.02,
2704
+ "grad_norm": 0.5234570503234863,
2705
+ "learning_rate": 2.443231441048035e-05,
2706
+ "loss": 0.0273,
2707
+ "step": 6925
2708
+ },
2709
+ {
2710
+ "epoch": 3.03,
2711
+ "grad_norm": 0.4756307601928711,
2712
+ "learning_rate": 2.415938864628821e-05,
2713
+ "loss": 0.0338,
2714
+ "step": 6950
2715
+ },
2716
+ {
2717
+ "epoch": 3.05,
2718
+ "grad_norm": 0.40528589487075806,
2719
+ "learning_rate": 2.388646288209607e-05,
2720
+ "loss": 0.0391,
2721
+ "step": 6975
2722
+ },
2723
+ {
2724
+ "epoch": 3.06,
2725
+ "grad_norm": 0.4284062385559082,
2726
+ "learning_rate": 2.361353711790393e-05,
2727
+ "loss": 0.0306,
2728
+ "step": 7000
2729
+ },
2730
+ {
2731
+ "epoch": 3.06,
2732
+ "eval_loss": 0.2782219350337982,
2733
+ "eval_na_accuracy": 0.8030887842178345,
2734
+ "eval_ordinal_accuracy": 0.7023981213569641,
2735
+ "eval_ordinal_mae": 0.3443802297115326,
2736
+ "eval_runtime": 155.0659,
2737
+ "eval_samples_per_second": 25.66,
2738
+ "eval_steps_per_second": 3.212,
2739
+ "step": 7000
2740
+ },
2741
+ {
2742
+ "epoch": 3.07,
2743
+ "grad_norm": 0.420832097530365,
2744
+ "learning_rate": 2.3340611353711792e-05,
2745
+ "loss": 0.0308,
2746
+ "step": 7025
2747
+ },
2748
+ {
2749
+ "epoch": 3.08,
2750
+ "grad_norm": 0.12627224624156952,
2751
+ "learning_rate": 2.3067685589519653e-05,
2752
+ "loss": 0.0219,
2753
+ "step": 7050
2754
+ },
2755
+ {
2756
+ "epoch": 3.09,
2757
+ "grad_norm": 0.6852111220359802,
2758
+ "learning_rate": 2.279475982532751e-05,
2759
+ "loss": 0.0289,
2760
+ "step": 7075
2761
+ },
2762
+ {
2763
+ "epoch": 3.1,
2764
+ "grad_norm": 0.4591895043849945,
2765
+ "learning_rate": 2.252183406113537e-05,
2766
+ "loss": 0.0257,
2767
+ "step": 7100
2768
+ },
2769
+ {
2770
+ "epoch": 3.1,
2771
+ "eval_loss": 0.2796567380428314,
2772
+ "eval_na_accuracy": 0.7934362888336182,
2773
+ "eval_ordinal_accuracy": 0.7084657549858093,
2774
+ "eval_ordinal_mae": 0.33523455262184143,
2775
+ "eval_runtime": 155.6618,
2776
+ "eval_samples_per_second": 25.562,
2777
+ "eval_steps_per_second": 3.199,
2778
+ "step": 7100
2779
+ },
2780
+ {
2781
+ "epoch": 3.11,
2782
+ "grad_norm": 7.793858051300049,
2783
+ "learning_rate": 2.2248908296943234e-05,
2784
+ "loss": 0.0323,
2785
+ "step": 7125
2786
+ },
2787
+ {
2788
+ "epoch": 3.12,
2789
+ "grad_norm": 0.4730852246284485,
2790
+ "learning_rate": 2.1975982532751095e-05,
2791
+ "loss": 0.0432,
2792
+ "step": 7150
2793
+ },
2794
+ {
2795
+ "epoch": 3.13,
2796
+ "grad_norm": 0.9067665338516235,
2797
+ "learning_rate": 2.170305676855895e-05,
2798
+ "loss": 0.0324,
2799
+ "step": 7175
2800
+ },
2801
+ {
2802
+ "epoch": 3.14,
2803
+ "grad_norm": 0.23400144279003143,
2804
+ "learning_rate": 2.1430131004366812e-05,
2805
+ "loss": 0.0241,
2806
+ "step": 7200
2807
+ },
2808
+ {
2809
+ "epoch": 3.14,
2810
+ "eval_loss": 0.2827575206756592,
2811
+ "eval_na_accuracy": 0.7953668236732483,
2812
+ "eval_ordinal_accuracy": 0.7058653831481934,
2813
+ "eval_ordinal_mae": 0.33425432443618774,
2814
+ "eval_runtime": 157.3972,
2815
+ "eval_samples_per_second": 25.28,
2816
+ "eval_steps_per_second": 3.164,
2817
+ "step": 7200
2818
+ },
2819
+ {
2820
+ "epoch": 3.16,
2821
+ "grad_norm": 0.6540936231613159,
2822
+ "learning_rate": 2.1157205240174673e-05,
2823
+ "loss": 0.0266,
2824
+ "step": 7225
2825
+ },
2826
+ {
2827
+ "epoch": 3.17,
2828
+ "grad_norm": 0.3661313056945801,
2829
+ "learning_rate": 2.0884279475982536e-05,
2830
+ "loss": 0.03,
2831
+ "step": 7250
2832
+ },
2833
+ {
2834
+ "epoch": 3.18,
2835
+ "grad_norm": 0.48538270592689514,
2836
+ "learning_rate": 2.0611353711790394e-05,
2837
+ "loss": 0.0268,
2838
+ "step": 7275
2839
+ },
2840
+ {
2841
+ "epoch": 3.19,
2842
+ "grad_norm": 0.12903234362602234,
2843
+ "learning_rate": 2.0338427947598254e-05,
2844
+ "loss": 0.0255,
2845
+ "step": 7300
2846
+ },
2847
+ {
2848
+ "epoch": 3.19,
2849
+ "eval_loss": 0.28903913497924805,
2850
+ "eval_na_accuracy": 0.8050193190574646,
2851
+ "eval_ordinal_accuracy": 0.6980641484260559,
2852
+ "eval_ordinal_mae": 0.3364236354827881,
2853
+ "eval_runtime": 155.705,
2854
+ "eval_samples_per_second": 25.555,
2855
+ "eval_steps_per_second": 3.198,
2856
+ "step": 7300
2857
+ },
2858
+ {
2859
+ "epoch": 3.2,
2860
+ "grad_norm": 0.6427123546600342,
2861
+ "learning_rate": 2.0065502183406115e-05,
2862
+ "loss": 0.0256,
2863
+ "step": 7325
2864
+ },
2865
+ {
2866
+ "epoch": 3.21,
2867
+ "grad_norm": 0.9630228281021118,
2868
+ "learning_rate": 1.9792576419213975e-05,
2869
+ "loss": 0.0261,
2870
+ "step": 7350
2871
+ },
2872
+ {
2873
+ "epoch": 3.22,
2874
+ "grad_norm": 0.4561873972415924,
2875
+ "learning_rate": 1.9519650655021836e-05,
2876
+ "loss": 0.0337,
2877
+ "step": 7375
2878
+ },
2879
+ {
2880
+ "epoch": 3.23,
2881
+ "grad_norm": 0.40141957998275757,
2882
+ "learning_rate": 1.9246724890829696e-05,
2883
+ "loss": 0.0245,
2884
+ "step": 7400
2885
+ },
2886
+ {
2887
+ "epoch": 3.23,
2888
+ "eval_loss": 0.29058343172073364,
2889
+ "eval_na_accuracy": 0.799227774143219,
2890
+ "eval_ordinal_accuracy": 0.7044206857681274,
2891
+ "eval_ordinal_mae": 0.3391839265823364,
2892
+ "eval_runtime": 156.6469,
2893
+ "eval_samples_per_second": 25.401,
2894
+ "eval_steps_per_second": 3.179,
2895
+ "step": 7400
2896
+ },
2897
+ {
2898
+ "epoch": 3.24,
2899
+ "grad_norm": 0.04360814392566681,
2900
+ "learning_rate": 1.8973799126637557e-05,
2901
+ "loss": 0.0271,
2902
+ "step": 7425
2903
+ },
2904
+ {
2905
+ "epoch": 3.25,
2906
+ "grad_norm": 0.2782489061355591,
2907
+ "learning_rate": 1.8700873362445414e-05,
2908
+ "loss": 0.0345,
2909
+ "step": 7450
2910
+ },
2911
+ {
2912
+ "epoch": 3.26,
2913
+ "grad_norm": 0.4086083769798279,
2914
+ "learning_rate": 1.8427947598253274e-05,
2915
+ "loss": 0.0519,
2916
+ "step": 7475
2917
+ },
2918
+ {
2919
+ "epoch": 3.28,
2920
+ "grad_norm": 0.37691470980644226,
2921
+ "learning_rate": 1.8155021834061138e-05,
2922
+ "loss": 0.0232,
2923
+ "step": 7500
2924
+ },
2925
+ {
2926
+ "epoch": 3.28,
2927
+ "eval_loss": 0.28911489248275757,
2928
+ "eval_na_accuracy": 0.7857142686843872,
2929
+ "eval_ordinal_accuracy": 0.7035539150238037,
2930
+ "eval_ordinal_mae": 0.3337612450122833,
2931
+ "eval_runtime": 155.5823,
2932
+ "eval_samples_per_second": 25.575,
2933
+ "eval_steps_per_second": 3.201,
2934
+ "step": 7500
2935
+ },
2936
+ {
2937
+ "epoch": 3.29,
2938
+ "grad_norm": 0.220824733376503,
2939
+ "learning_rate": 1.7882096069869e-05,
2940
+ "loss": 0.0274,
2941
+ "step": 7525
2942
+ },
2943
+ {
2944
+ "epoch": 3.3,
2945
+ "grad_norm": 0.38171643018722534,
2946
+ "learning_rate": 1.7609170305676856e-05,
2947
+ "loss": 0.0257,
2948
+ "step": 7550
2949
+ },
2950
+ {
2951
+ "epoch": 3.31,
2952
+ "grad_norm": 0.6748324632644653,
2953
+ "learning_rate": 1.7336244541484716e-05,
2954
+ "loss": 0.0212,
2955
+ "step": 7575
2956
+ },
2957
+ {
2958
+ "epoch": 3.32,
2959
+ "grad_norm": 0.42487770318984985,
2960
+ "learning_rate": 1.7063318777292577e-05,
2961
+ "loss": 0.0352,
2962
+ "step": 7600
2963
+ },
2964
+ {
2965
+ "epoch": 3.32,
2966
+ "eval_loss": 0.2908113896846771,
2967
+ "eval_na_accuracy": 0.7895752787590027,
2968
+ "eval_ordinal_accuracy": 0.6925743818283081,
2969
+ "eval_ordinal_mae": 0.34433993697166443,
2970
+ "eval_runtime": 154.4772,
2971
+ "eval_samples_per_second": 25.758,
2972
+ "eval_steps_per_second": 3.224,
2973
+ "step": 7600
2974
+ },
2975
+ {
2976
+ "epoch": 3.33,
2977
+ "grad_norm": 0.6813647747039795,
2978
+ "learning_rate": 1.6790393013100437e-05,
2979
+ "loss": 0.0405,
2980
+ "step": 7625
2981
+ },
2982
+ {
2983
+ "epoch": 3.34,
2984
+ "grad_norm": 0.46545034646987915,
2985
+ "learning_rate": 1.6517467248908298e-05,
2986
+ "loss": 0.0252,
2987
+ "step": 7650
2988
+ },
2989
+ {
2990
+ "epoch": 3.35,
2991
+ "grad_norm": 0.32561665773391724,
2992
+ "learning_rate": 1.6244541484716158e-05,
2993
+ "loss": 0.0287,
2994
+ "step": 7675
2995
+ },
2996
+ {
2997
+ "epoch": 3.36,
2998
+ "grad_norm": 0.38751569390296936,
2999
+ "learning_rate": 1.597161572052402e-05,
3000
+ "loss": 0.0376,
3001
+ "step": 7700
3002
+ },
3003
+ {
3004
+ "epoch": 3.36,
3005
+ "eval_loss": 0.2876608371734619,
3006
+ "eval_na_accuracy": 0.7915058135986328,
3007
+ "eval_ordinal_accuracy": 0.7049985527992249,
3008
+ "eval_ordinal_mae": 0.331503301858902,
3009
+ "eval_runtime": 155.51,
3010
+ "eval_samples_per_second": 25.587,
3011
+ "eval_steps_per_second": 3.202,
3012
+ "step": 7700
3013
+ },
3014
+ {
3015
+ "epoch": 3.37,
3016
+ "grad_norm": 0.317843496799469,
3017
+ "learning_rate": 1.5698689956331876e-05,
3018
+ "loss": 0.0346,
3019
+ "step": 7725
3020
+ },
3021
+ {
3022
+ "epoch": 3.38,
3023
+ "grad_norm": 0.09346043318510056,
3024
+ "learning_rate": 1.542576419213974e-05,
3025
+ "loss": 0.0187,
3026
+ "step": 7750
3027
+ },
3028
+ {
3029
+ "epoch": 3.4,
3030
+ "grad_norm": 0.508102536201477,
3031
+ "learning_rate": 1.51528384279476e-05,
3032
+ "loss": 0.0208,
3033
+ "step": 7775
3034
+ },
3035
+ {
3036
+ "epoch": 3.41,
3037
+ "grad_norm": 0.3914332687854767,
3038
+ "learning_rate": 1.487991266375546e-05,
3039
+ "loss": 0.025,
3040
+ "step": 7800
3041
+ },
3042
+ {
3043
+ "epoch": 3.41,
3044
+ "eval_loss": 0.2889249920845032,
3045
+ "eval_na_accuracy": 0.7895752787590027,
3046
+ "eval_ordinal_accuracy": 0.7075989842414856,
3047
+ "eval_ordinal_mae": 0.33163872361183167,
3048
+ "eval_runtime": 156.7804,
3049
+ "eval_samples_per_second": 25.379,
3050
+ "eval_steps_per_second": 3.176,
3051
+ "step": 7800
3052
+ },
3053
+ {
3054
+ "epoch": 3.42,
3055
+ "grad_norm": 0.6848337054252625,
3056
+ "learning_rate": 1.460698689956332e-05,
3057
+ "loss": 0.0259,
3058
+ "step": 7825
3059
+ },
3060
+ {
3061
+ "epoch": 3.43,
3062
+ "grad_norm": 0.5732501149177551,
3063
+ "learning_rate": 1.433406113537118e-05,
3064
+ "loss": 0.0222,
3065
+ "step": 7850
3066
+ },
3067
+ {
3068
+ "epoch": 3.44,
3069
+ "grad_norm": 0.638590931892395,
3070
+ "learning_rate": 1.406113537117904e-05,
3071
+ "loss": 0.0237,
3072
+ "step": 7875
3073
+ },
3074
+ {
3075
+ "epoch": 3.45,
3076
+ "grad_norm": 0.7879953980445862,
3077
+ "learning_rate": 1.37882096069869e-05,
3078
+ "loss": 0.0225,
3079
+ "step": 7900
3080
+ },
3081
+ {
3082
+ "epoch": 3.45,
3083
+ "eval_loss": 0.2901510000228882,
3084
+ "eval_na_accuracy": 0.7818532586097717,
3085
+ "eval_ordinal_accuracy": 0.7070211172103882,
3086
+ "eval_ordinal_mae": 0.32855790853500366,
3087
+ "eval_runtime": 157.7082,
3088
+ "eval_samples_per_second": 25.23,
3089
+ "eval_steps_per_second": 3.158,
3090
+ "step": 7900
3091
+ },
3092
+ {
3093
+ "epoch": 3.46,
3094
+ "grad_norm": 0.45826223492622375,
3095
+ "learning_rate": 1.351528384279476e-05,
3096
+ "loss": 0.03,
3097
+ "step": 7925
3098
+ },
3099
+ {
3100
+ "epoch": 3.47,
3101
+ "grad_norm": 0.5093744993209839,
3102
+ "learning_rate": 1.324235807860262e-05,
3103
+ "loss": 0.023,
3104
+ "step": 7950
3105
+ },
3106
+ {
3107
+ "epoch": 3.48,
3108
+ "grad_norm": 0.39197513461112976,
3109
+ "learning_rate": 1.2969432314410482e-05,
3110
+ "loss": 0.0263,
3111
+ "step": 7975
3112
+ },
3113
+ {
3114
+ "epoch": 3.49,
3115
+ "grad_norm": 0.4618348777294159,
3116
+ "learning_rate": 1.269650655021834e-05,
3117
+ "loss": 0.024,
3118
+ "step": 8000
3119
+ },
3120
+ {
3121
+ "epoch": 3.49,
3122
+ "eval_loss": 0.29018646478652954,
3123
+ "eval_na_accuracy": 0.7953668236732483,
3124
+ "eval_ordinal_accuracy": 0.7101993560791016,
3125
+ "eval_ordinal_mae": 0.3269650340080261,
3126
+ "eval_runtime": 155.3244,
3127
+ "eval_samples_per_second": 25.617,
3128
+ "eval_steps_per_second": 3.206,
3129
+ "step": 8000
3130
+ },
3131
+ {
3132
+ "epoch": 3.5,
3133
+ "grad_norm": 0.5933849811553955,
3134
+ "learning_rate": 1.2423580786026202e-05,
3135
+ "loss": 0.0225,
3136
+ "step": 8025
3137
+ },
3138
+ {
3139
+ "epoch": 3.52,
3140
+ "grad_norm": 0.04294763505458832,
3141
+ "learning_rate": 1.2150655021834062e-05,
3142
+ "loss": 0.0228,
3143
+ "step": 8050
3144
+ },
3145
+ {
3146
+ "epoch": 3.53,
3147
+ "grad_norm": 0.5286312699317932,
3148
+ "learning_rate": 1.1877729257641921e-05,
3149
+ "loss": 0.03,
3150
+ "step": 8075
3151
+ },
3152
+ {
3153
+ "epoch": 3.54,
3154
+ "grad_norm": 0.09194879978895187,
3155
+ "learning_rate": 1.1604803493449783e-05,
3156
+ "loss": 0.0404,
3157
+ "step": 8100
3158
+ },
3159
+ {
3160
+ "epoch": 3.54,
3161
+ "eval_loss": 0.29498282074928284,
3162
+ "eval_na_accuracy": 0.7895752787590027,
3163
+ "eval_ordinal_accuracy": 0.705287516117096,
3164
+ "eval_ordinal_mae": 0.32936587929725647,
3165
+ "eval_runtime": 156.3299,
3166
+ "eval_samples_per_second": 25.453,
3167
+ "eval_steps_per_second": 3.186,
3168
+ "step": 8100
3169
+ },
3170
+ {
3171
+ "epoch": 3.55,
3172
+ "grad_norm": 0.3700167238712311,
3173
+ "learning_rate": 1.1331877729257642e-05,
3174
+ "loss": 0.0293,
3175
+ "step": 8125
3176
+ },
3177
+ {
3178
+ "epoch": 3.56,
3179
+ "grad_norm": 0.10644713789224625,
3180
+ "learning_rate": 1.1058951965065504e-05,
3181
+ "loss": 0.0265,
3182
+ "step": 8150
3183
+ },
3184
+ {
3185
+ "epoch": 3.57,
3186
+ "grad_norm": 0.4384317994117737,
3187
+ "learning_rate": 1.0786026200873363e-05,
3188
+ "loss": 0.0274,
3189
+ "step": 8175
3190
+ },
3191
+ {
3192
+ "epoch": 3.58,
3193
+ "grad_norm": 0.4964589774608612,
3194
+ "learning_rate": 1.0513100436681223e-05,
3195
+ "loss": 0.0221,
3196
+ "step": 8200
3197
+ },
3198
+ {
3199
+ "epoch": 3.58,
3200
+ "eval_loss": 0.2923668920993805,
3201
+ "eval_na_accuracy": 0.7934362888336182,
3202
+ "eval_ordinal_accuracy": 0.7093325853347778,
3203
+ "eval_ordinal_mae": 0.3270767033100128,
3204
+ "eval_runtime": 156.0459,
3205
+ "eval_samples_per_second": 25.499,
3206
+ "eval_steps_per_second": 3.191,
3207
+ "step": 8200
3208
+ },
3209
+ {
3210
+ "epoch": 3.59,
3211
+ "grad_norm": 0.20232300460338593,
3212
+ "learning_rate": 1.0240174672489084e-05,
3213
+ "loss": 0.0253,
3214
+ "step": 8225
3215
+ },
3216
+ {
3217
+ "epoch": 3.6,
3218
+ "grad_norm": 0.19642572104930878,
3219
+ "learning_rate": 9.967248908296943e-06,
3220
+ "loss": 0.0309,
3221
+ "step": 8250
3222
+ },
3223
+ {
3224
+ "epoch": 3.61,
3225
+ "grad_norm": 0.12435351312160492,
3226
+ "learning_rate": 9.694323144104805e-06,
3227
+ "loss": 0.0275,
3228
+ "step": 8275
3229
+ },
3230
+ {
3231
+ "epoch": 3.62,
3232
+ "grad_norm": 0.3429054915904999,
3233
+ "learning_rate": 9.421397379912664e-06,
3234
+ "loss": 0.0182,
3235
+ "step": 8300
3236
+ },
3237
+ {
3238
+ "epoch": 3.62,
3239
+ "eval_loss": 0.29207319021224976,
3240
+ "eval_na_accuracy": 0.7934362888336182,
3241
+ "eval_ordinal_accuracy": 0.7104883193969727,
3242
+ "eval_ordinal_mae": 0.32371771335601807,
3243
+ "eval_runtime": 157.1157,
3244
+ "eval_samples_per_second": 25.325,
3245
+ "eval_steps_per_second": 3.17,
3246
+ "step": 8300
3247
+ },
3248
+ {
3249
+ "epoch": 3.64,
3250
+ "grad_norm": 0.11734936386346817,
3251
+ "learning_rate": 9.148471615720524e-06,
3252
+ "loss": 0.0341,
3253
+ "step": 8325
3254
+ },
3255
+ {
3256
+ "epoch": 3.65,
3257
+ "grad_norm": 0.18671230971813202,
3258
+ "learning_rate": 8.875545851528385e-06,
3259
+ "loss": 0.0257,
3260
+ "step": 8350
3261
+ },
3262
+ {
3263
+ "epoch": 3.66,
3264
+ "grad_norm": 0.1754232794046402,
3265
+ "learning_rate": 8.602620087336245e-06,
3266
+ "loss": 0.0306,
3267
+ "step": 8375
3268
+ },
3269
+ {
3270
+ "epoch": 3.67,
3271
+ "grad_norm": 0.2011016607284546,
3272
+ "learning_rate": 8.329694323144106e-06,
3273
+ "loss": 0.0304,
3274
+ "step": 8400
3275
+ },
3276
+ {
3277
+ "epoch": 3.67,
3278
+ "eval_loss": 0.29112711548805237,
3279
+ "eval_na_accuracy": 0.7857142686843872,
3280
+ "eval_ordinal_accuracy": 0.7133776545524597,
3281
+ "eval_ordinal_mae": 0.3231416344642639,
3282
+ "eval_runtime": 155.6543,
3283
+ "eval_samples_per_second": 25.563,
3284
+ "eval_steps_per_second": 3.199,
3285
+ "step": 8400
3286
+ },
3287
+ {
3288
+ "epoch": 3.68,
3289
+ "grad_norm": 1.1286529302597046,
3290
+ "learning_rate": 8.056768558951966e-06,
3291
+ "loss": 0.0285,
3292
+ "step": 8425
3293
+ },
3294
+ {
3295
+ "epoch": 3.69,
3296
+ "grad_norm": 0.4878416359424591,
3297
+ "learning_rate": 7.783842794759825e-06,
3298
+ "loss": 0.0163,
3299
+ "step": 8450
3300
+ },
3301
+ {
3302
+ "epoch": 3.7,
3303
+ "grad_norm": 0.5879720449447632,
3304
+ "learning_rate": 7.510917030567686e-06,
3305
+ "loss": 0.0433,
3306
+ "step": 8475
3307
+ },
3308
+ {
3309
+ "epoch": 3.71,
3310
+ "grad_norm": 0.8792235255241394,
3311
+ "learning_rate": 7.237991266375546e-06,
3312
+ "loss": 0.0193,
3313
+ "step": 8500
3314
+ },
3315
+ {
3316
+ "epoch": 3.71,
3317
+ "eval_loss": 0.2914559841156006,
3318
+ "eval_na_accuracy": 0.7837837934494019,
3319
+ "eval_ordinal_accuracy": 0.7165558934211731,
3320
+ "eval_ordinal_mae": 0.32214629650115967,
3321
+ "eval_runtime": 155.1881,
3322
+ "eval_samples_per_second": 25.64,
3323
+ "eval_steps_per_second": 3.209,
3324
+ "step": 8500
3325
+ },
3326
+ {
3327
+ "epoch": 3.72,
3328
+ "grad_norm": 0.2179727405309677,
3329
+ "learning_rate": 6.9650655021834055e-06,
3330
+ "loss": 0.0237,
3331
+ "step": 8525
3332
+ },
3333
+ {
3334
+ "epoch": 3.73,
3335
+ "grad_norm": 0.7128644585609436,
3336
+ "learning_rate": 6.692139737991267e-06,
3337
+ "loss": 0.0306,
3338
+ "step": 8550
3339
+ },
3340
+ {
3341
+ "epoch": 3.74,
3342
+ "grad_norm": 0.2546403110027313,
3343
+ "learning_rate": 6.4192139737991265e-06,
3344
+ "loss": 0.0215,
3345
+ "step": 8575
3346
+ },
3347
+ {
3348
+ "epoch": 3.76,
3349
+ "grad_norm": 0.39975956082344055,
3350
+ "learning_rate": 6.146288209606987e-06,
3351
+ "loss": 0.0223,
3352
+ "step": 8600
3353
+ },
3354
+ {
3355
+ "epoch": 3.76,
3356
+ "eval_loss": 0.29310527443885803,
3357
+ "eval_na_accuracy": 0.7895752787590027,
3358
+ "eval_ordinal_accuracy": 0.7122219204902649,
3359
+ "eval_ordinal_mae": 0.32349658012390137,
3360
+ "eval_runtime": 154.5645,
3361
+ "eval_samples_per_second": 25.743,
3362
+ "eval_steps_per_second": 3.222,
3363
+ "step": 8600
3364
+ },
3365
+ {
3366
+ "epoch": 3.77,
3367
+ "grad_norm": 0.34678587317466736,
3368
+ "learning_rate": 5.884279475982533e-06,
3369
+ "loss": 0.0203,
3370
+ "step": 8625
3371
+ },
3372
+ {
3373
+ "epoch": 3.78,
3374
+ "grad_norm": 0.46029233932495117,
3375
+ "learning_rate": 5.611353711790393e-06,
3376
+ "loss": 0.0252,
3377
+ "step": 8650
3378
+ },
3379
+ {
3380
+ "epoch": 3.79,
3381
+ "grad_norm": 0.3514462113380432,
3382
+ "learning_rate": 5.338427947598254e-06,
3383
+ "loss": 0.0182,
3384
+ "step": 8675
3385
+ },
3386
+ {
3387
+ "epoch": 3.8,
3388
+ "grad_norm": 0.9231175184249878,
3389
+ "learning_rate": 5.065502183406113e-06,
3390
+ "loss": 0.0254,
3391
+ "step": 8700
3392
+ },
3393
+ {
3394
+ "epoch": 3.8,
3395
+ "eval_loss": 0.2946593761444092,
3396
+ "eval_na_accuracy": 0.7876448035240173,
3397
+ "eval_ordinal_accuracy": 0.7174227237701416,
3398
+ "eval_ordinal_mae": 0.32142174243927,
3399
+ "eval_runtime": 155.717,
3400
+ "eval_samples_per_second": 25.553,
3401
+ "eval_steps_per_second": 3.198,
3402
+ "step": 8700
3403
+ },
3404
+ {
3405
+ "epoch": 3.81,
3406
+ "grad_norm": 0.659959614276886,
3407
+ "learning_rate": 4.792576419213974e-06,
3408
+ "loss": 0.0277,
3409
+ "step": 8725
3410
+ },
3411
+ {
3412
+ "epoch": 3.82,
3413
+ "grad_norm": 1.0919593572616577,
3414
+ "learning_rate": 4.519650655021834e-06,
3415
+ "loss": 0.0306,
3416
+ "step": 8750
3417
+ },
3418
+ {
3419
+ "epoch": 3.83,
3420
+ "grad_norm": 0.9325423836708069,
3421
+ "learning_rate": 4.246724890829695e-06,
3422
+ "loss": 0.0331,
3423
+ "step": 8775
3424
+ },
3425
+ {
3426
+ "epoch": 3.84,
3427
+ "grad_norm": 0.21544720232486725,
3428
+ "learning_rate": 3.9737991266375545e-06,
3429
+ "loss": 0.0215,
3430
+ "step": 8800
3431
+ },
3432
+ {
3433
+ "epoch": 3.84,
3434
+ "eval_loss": 0.293580025434494,
3435
+ "eval_na_accuracy": 0.7857142686843872,
3436
+ "eval_ordinal_accuracy": 0.7127997875213623,
3437
+ "eval_ordinal_mae": 0.3201707899570465,
3438
+ "eval_runtime": 157.8828,
3439
+ "eval_samples_per_second": 25.202,
3440
+ "eval_steps_per_second": 3.154,
3441
+ "step": 8800
3442
+ },
3443
+ {
3444
+ "epoch": 3.85,
3445
+ "grad_norm": 0.43472805619239807,
3446
+ "learning_rate": 3.7008733624454154e-06,
3447
+ "loss": 0.0345,
3448
+ "step": 8825
3449
+ },
3450
+ {
3451
+ "epoch": 3.86,
3452
+ "grad_norm": 0.18415190279483795,
3453
+ "learning_rate": 3.4279475982532755e-06,
3454
+ "loss": 0.0268,
3455
+ "step": 8850
3456
+ },
3457
+ {
3458
+ "epoch": 3.88,
3459
+ "grad_norm": 0.8721256852149963,
3460
+ "learning_rate": 3.155021834061136e-06,
3461
+ "loss": 0.0258,
3462
+ "step": 8875
3463
+ },
3464
+ {
3465
+ "epoch": 3.89,
3466
+ "grad_norm": 0.1487792730331421,
3467
+ "learning_rate": 2.8820960698689956e-06,
3468
+ "loss": 0.0312,
3469
+ "step": 8900
3470
+ },
3471
+ {
3472
+ "epoch": 3.89,
3473
+ "eval_loss": 0.295600026845932,
3474
+ "eval_na_accuracy": 0.7857142686843872,
3475
+ "eval_ordinal_accuracy": 0.7133776545524597,
3476
+ "eval_ordinal_mae": 0.32104432582855225,
3477
+ "eval_runtime": 158.1112,
3478
+ "eval_samples_per_second": 25.166,
3479
+ "eval_steps_per_second": 3.15,
3480
+ "step": 8900
3481
+ },
3482
+ {
3483
+ "epoch": 3.9,
3484
+ "grad_norm": 0.6491901278495789,
3485
+ "learning_rate": 2.609170305676856e-06,
3486
+ "loss": 0.0526,
3487
+ "step": 8925
3488
+ },
3489
+ {
3490
+ "epoch": 3.91,
3491
+ "grad_norm": 0.40534159541130066,
3492
+ "learning_rate": 2.336244541484716e-06,
3493
+ "loss": 0.0227,
3494
+ "step": 8950
3495
+ },
3496
+ {
3497
+ "epoch": 3.92,
3498
+ "grad_norm": 0.06649890542030334,
3499
+ "learning_rate": 2.0633187772925767e-06,
3500
+ "loss": 0.0157,
3501
+ "step": 8975
3502
+ },
3503
+ {
3504
+ "epoch": 3.93,
3505
+ "grad_norm": 0.2696411609649658,
3506
+ "learning_rate": 1.7903930131004367e-06,
3507
+ "loss": 0.0189,
3508
+ "step": 9000
3509
+ },
3510
+ {
3511
+ "epoch": 3.93,
3512
+ "eval_loss": 0.2945779263973236,
3513
+ "eval_na_accuracy": 0.7876448035240173,
3514
+ "eval_ordinal_accuracy": 0.7125108242034912,
3515
+ "eval_ordinal_mae": 0.3210395574569702,
3516
+ "eval_runtime": 157.9016,
3517
+ "eval_samples_per_second": 25.199,
3518
+ "eval_steps_per_second": 3.154,
3519
+ "step": 9000
3520
+ },
3521
+ {
3522
+ "epoch": 3.94,
3523
+ "grad_norm": 0.16493447124958038,
3524
+ "learning_rate": 1.517467248908297e-06,
3525
+ "loss": 0.0204,
3526
+ "step": 9025
3527
+ },
3528
+ {
3529
+ "epoch": 3.95,
3530
+ "grad_norm": 0.5132766366004944,
3531
+ "learning_rate": 1.2445414847161573e-06,
3532
+ "loss": 0.0182,
3533
+ "step": 9050
3534
+ },
3535
+ {
3536
+ "epoch": 3.96,
3537
+ "grad_norm": 0.06899993121623993,
3538
+ "learning_rate": 9.716157205240176e-07,
3539
+ "loss": 0.0194,
3540
+ "step": 9075
3541
+ },
3542
+ {
3543
+ "epoch": 3.97,
3544
+ "grad_norm": 0.09663155674934387,
3545
+ "learning_rate": 6.986899563318777e-07,
3546
+ "loss": 0.021,
3547
+ "step": 9100
3548
+ },
3549
+ {
3550
+ "epoch": 3.97,
3551
+ "eval_loss": 0.2948993146419525,
3552
+ "eval_na_accuracy": 0.7876448035240173,
3553
+ "eval_ordinal_accuracy": 0.7145333886146545,
3554
+ "eval_ordinal_mae": 0.31944769620895386,
3555
+ "eval_runtime": 157.4833,
3556
+ "eval_samples_per_second": 25.266,
3557
+ "eval_steps_per_second": 3.162,
3558
+ "step": 9100
3559
+ },
3560
+ {
3561
+ "epoch": 3.98,
3562
+ "grad_norm": 0.4333021342754364,
3563
+ "learning_rate": 4.2576419213973797e-07,
3564
+ "loss": 0.0262,
3565
+ "step": 9125
3566
+ },
3567
+ {
3568
+ "epoch": 4.0,
3569
+ "grad_norm": 0.39392733573913574,
3570
+ "learning_rate": 1.5283842794759825e-07,
3571
+ "loss": 0.0247,
3572
+ "step": 9150
3573
+ },
3574
+ {
3575
+ "epoch": 4.0,
3576
+ "step": 9160,
3577
+ "total_flos": 1.1353293455817277e+19,
3578
+ "train_loss": 0.11642740544208273,
3579
+ "train_runtime": 27830.4973,
3580
+ "train_samples_per_second": 5.264,
3581
+ "train_steps_per_second": 0.329
3582
+ }
3583
+ ],
3584
+ "logging_steps": 25,
3585
+ "max_steps": 9160,
3586
+ "num_input_tokens_seen": 0,
3587
+ "num_train_epochs": 4,
3588
+ "save_steps": 100,
3589
+ "total_flos": 1.1353293455817277e+19,
3590
+ "train_batch_size": 16,
3591
+ "trial_name": null,
3592
+ "trial_params": null
3593
+ }