Augusto777 commited on
Commit
7a7781c
·
verified ·
1 Parent(s): b1b67f4

End of training

Browse files
README.md CHANGED
@@ -23,7 +23,7 @@ model-index:
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
- value: 0.21739130434782608
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -33,8 +33,8 @@ should probably proofread and complete it, then remove this comment. -->
33
 
34
  This model is a fine-tuned version of [microsoft/swinv2-tiny-patch4-window8-256](https://huggingface.co/microsoft/swinv2-tiny-patch4-window8-256) on the imagefolder dataset.
35
  It achieves the following results on the evaluation set:
36
- - Loss: 1.4532
37
- - Accuracy: 0.2174
38
 
39
  ## Model description
40
 
 
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
+ value: 0.3695652173913043
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
33
 
34
  This model is a fine-tuned version of [microsoft/swinv2-tiny-patch4-window8-256](https://huggingface.co/microsoft/swinv2-tiny-patch4-window8-256) on the imagefolder dataset.
35
  It achieves the following results on the evaluation set:
36
+ - Loss: 1.3992
37
+ - Accuracy: 0.3696
38
 
39
  ## Model description
40
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 117.33333333333333,
3
+ "eval_accuracy": 0.3695652173913043,
4
+ "eval_loss": 1.3992067575454712,
5
+ "eval_runtime": 1.0044,
6
+ "eval_samples_per_second": 45.799,
7
+ "eval_steps_per_second": 2.987,
8
+ "total_flos": 5.466852859010089e+18,
9
+ "train_loss": 1.1631801536588957,
10
+ "train_runtime": 5282.7516,
11
+ "train_samples_per_second": 32.529,
12
+ "train_steps_per_second": 0.5
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 117.33333333333333,
3
+ "eval_accuracy": 0.3695652173913043,
4
+ "eval_loss": 1.3992067575454712,
5
+ "eval_runtime": 1.0044,
6
+ "eval_samples_per_second": 45.799,
7
+ "eval_steps_per_second": 2.987
8
+ }
runs/Dec05_23-48-52_8fb6626b3f6f/events.out.tfevents.1733447829.8fb6626b3f6f.752.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2922ef2a37c1f5d10aee2bbc62b7c71e977da9cb38dfdb9eb2bb41557343d2ea
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 117.33333333333333,
3
+ "total_flos": 5.466852859010089e+18,
4
+ "train_loss": 1.1631801536588957,
5
+ "train_runtime": 5282.7516,
6
+ "train_samples_per_second": 32.529,
7
+ "train_steps_per_second": 0.5
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2952 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.3695652173913043,
3
+ "best_model_checkpoint": "swinv2-tiny-patch4-window8-256-DMAE-da3-colab/checkpoint-1485",
4
+ "epoch": 117.33333333333333,
5
+ "eval_steps": 500,
6
+ "global_step": 2640,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.4444444444444444,
13
+ "grad_norm": 3.825187921524048,
14
+ "learning_rate": 0.0009962121212121213,
15
+ "loss": 1.5683,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.8888888888888888,
20
+ "grad_norm": 5.65834903717041,
21
+ "learning_rate": 0.0009924242424242424,
22
+ "loss": 1.3523,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.9777777777777777,
27
+ "eval_accuracy": 0.32608695652173914,
28
+ "eval_loss": 1.4024076461791992,
29
+ "eval_runtime": 1.0691,
30
+ "eval_samples_per_second": 43.028,
31
+ "eval_steps_per_second": 2.806,
32
+ "step": 22
33
+ },
34
+ {
35
+ "epoch": 1.3333333333333333,
36
+ "grad_norm": 10.152283668518066,
37
+ "learning_rate": 0.0009886363636363636,
38
+ "loss": 1.2778,
39
+ "step": 30
40
+ },
41
+ {
42
+ "epoch": 1.7777777777777777,
43
+ "grad_norm": 3.099470376968384,
44
+ "learning_rate": 0.000984848484848485,
45
+ "loss": 1.3805,
46
+ "step": 40
47
+ },
48
+ {
49
+ "epoch": 2.0,
50
+ "eval_accuracy": 0.2608695652173913,
51
+ "eval_loss": 1.3775243759155273,
52
+ "eval_runtime": 0.9313,
53
+ "eval_samples_per_second": 49.394,
54
+ "eval_steps_per_second": 3.221,
55
+ "step": 45
56
+ },
57
+ {
58
+ "epoch": 2.2222222222222223,
59
+ "grad_norm": 4.7406511306762695,
60
+ "learning_rate": 0.0009810606060606062,
61
+ "loss": 1.3708,
62
+ "step": 50
63
+ },
64
+ {
65
+ "epoch": 2.6666666666666665,
66
+ "grad_norm": 3.888740301132202,
67
+ "learning_rate": 0.0009772727272727272,
68
+ "loss": 1.3221,
69
+ "step": 60
70
+ },
71
+ {
72
+ "epoch": 2.977777777777778,
73
+ "eval_accuracy": 0.30434782608695654,
74
+ "eval_loss": 1.4418785572052002,
75
+ "eval_runtime": 0.8857,
76
+ "eval_samples_per_second": 51.938,
77
+ "eval_steps_per_second": 3.387,
78
+ "step": 67
79
+ },
80
+ {
81
+ "epoch": 3.111111111111111,
82
+ "grad_norm": 2.308217763900757,
83
+ "learning_rate": 0.0009734848484848485,
84
+ "loss": 1.2355,
85
+ "step": 70
86
+ },
87
+ {
88
+ "epoch": 3.5555555555555554,
89
+ "grad_norm": 5.7985053062438965,
90
+ "learning_rate": 0.0009696969696969698,
91
+ "loss": 1.2409,
92
+ "step": 80
93
+ },
94
+ {
95
+ "epoch": 4.0,
96
+ "grad_norm": 110.87601470947266,
97
+ "learning_rate": 0.000965909090909091,
98
+ "loss": 1.297,
99
+ "step": 90
100
+ },
101
+ {
102
+ "epoch": 4.0,
103
+ "eval_accuracy": 0.32608695652173914,
104
+ "eval_loss": 1.3581539392471313,
105
+ "eval_runtime": 0.8843,
106
+ "eval_samples_per_second": 52.019,
107
+ "eval_steps_per_second": 3.393,
108
+ "step": 90
109
+ },
110
+ {
111
+ "epoch": 4.444444444444445,
112
+ "grad_norm": 4.244718551635742,
113
+ "learning_rate": 0.0009621212121212122,
114
+ "loss": 1.3056,
115
+ "step": 100
116
+ },
117
+ {
118
+ "epoch": 4.888888888888889,
119
+ "grad_norm": 4.412840843200684,
120
+ "learning_rate": 0.0009583333333333334,
121
+ "loss": 1.353,
122
+ "step": 110
123
+ },
124
+ {
125
+ "epoch": 4.977777777777778,
126
+ "eval_accuracy": 0.34782608695652173,
127
+ "eval_loss": 1.3405784368515015,
128
+ "eval_runtime": 0.8733,
129
+ "eval_samples_per_second": 52.672,
130
+ "eval_steps_per_second": 3.435,
131
+ "step": 112
132
+ },
133
+ {
134
+ "epoch": 5.333333333333333,
135
+ "grad_norm": 2.0132997035980225,
136
+ "learning_rate": 0.0009545454545454546,
137
+ "loss": 1.3115,
138
+ "step": 120
139
+ },
140
+ {
141
+ "epoch": 5.777777777777778,
142
+ "grad_norm": 3.7579846382141113,
143
+ "learning_rate": 0.0009507575757575758,
144
+ "loss": 1.2627,
145
+ "step": 130
146
+ },
147
+ {
148
+ "epoch": 6.0,
149
+ "eval_accuracy": 0.15217391304347827,
150
+ "eval_loss": 1.3823662996292114,
151
+ "eval_runtime": 0.8973,
152
+ "eval_samples_per_second": 51.264,
153
+ "eval_steps_per_second": 3.343,
154
+ "step": 135
155
+ },
156
+ {
157
+ "epoch": 6.222222222222222,
158
+ "grad_norm": 3.731177806854248,
159
+ "learning_rate": 0.000946969696969697,
160
+ "loss": 1.2545,
161
+ "step": 140
162
+ },
163
+ {
164
+ "epoch": 6.666666666666667,
165
+ "grad_norm": 3.8037898540496826,
166
+ "learning_rate": 0.0009431818181818183,
167
+ "loss": 1.3006,
168
+ "step": 150
169
+ },
170
+ {
171
+ "epoch": 6.977777777777778,
172
+ "eval_accuracy": 0.15217391304347827,
173
+ "eval_loss": 1.4008022546768188,
174
+ "eval_runtime": 1.1693,
175
+ "eval_samples_per_second": 39.34,
176
+ "eval_steps_per_second": 2.566,
177
+ "step": 157
178
+ },
179
+ {
180
+ "epoch": 7.111111111111111,
181
+ "grad_norm": 2.627021074295044,
182
+ "learning_rate": 0.0009393939393939394,
183
+ "loss": 1.2558,
184
+ "step": 160
185
+ },
186
+ {
187
+ "epoch": 7.555555555555555,
188
+ "grad_norm": 2.487823247909546,
189
+ "learning_rate": 0.0009356060606060606,
190
+ "loss": 1.216,
191
+ "step": 170
192
+ },
193
+ {
194
+ "epoch": 8.0,
195
+ "grad_norm": 2.525559902191162,
196
+ "learning_rate": 0.0009318181818181818,
197
+ "loss": 1.2438,
198
+ "step": 180
199
+ },
200
+ {
201
+ "epoch": 8.0,
202
+ "eval_accuracy": 0.32608695652173914,
203
+ "eval_loss": 1.3769112825393677,
204
+ "eval_runtime": 0.9024,
205
+ "eval_samples_per_second": 50.973,
206
+ "eval_steps_per_second": 3.324,
207
+ "step": 180
208
+ },
209
+ {
210
+ "epoch": 8.444444444444445,
211
+ "grad_norm": 9.093722343444824,
212
+ "learning_rate": 0.000928030303030303,
213
+ "loss": 1.2023,
214
+ "step": 190
215
+ },
216
+ {
217
+ "epoch": 8.88888888888889,
218
+ "grad_norm": 7.364065170288086,
219
+ "learning_rate": 0.0009242424242424242,
220
+ "loss": 1.222,
221
+ "step": 200
222
+ },
223
+ {
224
+ "epoch": 8.977777777777778,
225
+ "eval_accuracy": 0.30434782608695654,
226
+ "eval_loss": 1.421162486076355,
227
+ "eval_runtime": 0.8938,
228
+ "eval_samples_per_second": 51.468,
229
+ "eval_steps_per_second": 3.357,
230
+ "step": 202
231
+ },
232
+ {
233
+ "epoch": 9.333333333333334,
234
+ "grad_norm": 3.667402744293213,
235
+ "learning_rate": 0.0009204545454545455,
236
+ "loss": 1.2186,
237
+ "step": 210
238
+ },
239
+ {
240
+ "epoch": 9.777777777777779,
241
+ "grad_norm": 10.466683387756348,
242
+ "learning_rate": 0.0009166666666666666,
243
+ "loss": 1.2221,
244
+ "step": 220
245
+ },
246
+ {
247
+ "epoch": 10.0,
248
+ "eval_accuracy": 0.2391304347826087,
249
+ "eval_loss": 1.4223273992538452,
250
+ "eval_runtime": 0.8601,
251
+ "eval_samples_per_second": 53.48,
252
+ "eval_steps_per_second": 3.488,
253
+ "step": 225
254
+ },
255
+ {
256
+ "epoch": 10.222222222222221,
257
+ "grad_norm": 13.964035987854004,
258
+ "learning_rate": 0.0009128787878787878,
259
+ "loss": 1.2394,
260
+ "step": 230
261
+ },
262
+ {
263
+ "epoch": 10.666666666666666,
264
+ "grad_norm": 4.716994762420654,
265
+ "learning_rate": 0.0009090909090909091,
266
+ "loss": 1.2262,
267
+ "step": 240
268
+ },
269
+ {
270
+ "epoch": 10.977777777777778,
271
+ "eval_accuracy": 0.2608695652173913,
272
+ "eval_loss": 1.4154268503189087,
273
+ "eval_runtime": 1.2346,
274
+ "eval_samples_per_second": 37.258,
275
+ "eval_steps_per_second": 2.43,
276
+ "step": 247
277
+ },
278
+ {
279
+ "epoch": 11.11111111111111,
280
+ "grad_norm": 11.025979995727539,
281
+ "learning_rate": 0.0009053030303030303,
282
+ "loss": 1.2196,
283
+ "step": 250
284
+ },
285
+ {
286
+ "epoch": 11.555555555555555,
287
+ "grad_norm": 22.928346633911133,
288
+ "learning_rate": 0.0009015151515151515,
289
+ "loss": 1.2131,
290
+ "step": 260
291
+ },
292
+ {
293
+ "epoch": 12.0,
294
+ "grad_norm": 7.0689263343811035,
295
+ "learning_rate": 0.0008977272727272727,
296
+ "loss": 1.2381,
297
+ "step": 270
298
+ },
299
+ {
300
+ "epoch": 12.0,
301
+ "eval_accuracy": 0.2391304347826087,
302
+ "eval_loss": 1.3327127695083618,
303
+ "eval_runtime": 0.884,
304
+ "eval_samples_per_second": 52.039,
305
+ "eval_steps_per_second": 3.394,
306
+ "step": 270
307
+ },
308
+ {
309
+ "epoch": 12.444444444444445,
310
+ "grad_norm": 3.511734962463379,
311
+ "learning_rate": 0.000893939393939394,
312
+ "loss": 1.1634,
313
+ "step": 280
314
+ },
315
+ {
316
+ "epoch": 12.88888888888889,
317
+ "grad_norm": 33.742347717285156,
318
+ "learning_rate": 0.0008901515151515151,
319
+ "loss": 1.227,
320
+ "step": 290
321
+ },
322
+ {
323
+ "epoch": 12.977777777777778,
324
+ "eval_accuracy": 0.2826086956521739,
325
+ "eval_loss": 1.288680076599121,
326
+ "eval_runtime": 0.8658,
327
+ "eval_samples_per_second": 53.132,
328
+ "eval_steps_per_second": 3.465,
329
+ "step": 292
330
+ },
331
+ {
332
+ "epoch": 13.333333333333334,
333
+ "grad_norm": 6.175626277923584,
334
+ "learning_rate": 0.0008863636363636364,
335
+ "loss": 1.2082,
336
+ "step": 300
337
+ },
338
+ {
339
+ "epoch": 13.777777777777779,
340
+ "grad_norm": 11.057406425476074,
341
+ "learning_rate": 0.0008825757575757576,
342
+ "loss": 1.2158,
343
+ "step": 310
344
+ },
345
+ {
346
+ "epoch": 14.0,
347
+ "eval_accuracy": 0.2608695652173913,
348
+ "eval_loss": 1.3465280532836914,
349
+ "eval_runtime": 0.872,
350
+ "eval_samples_per_second": 52.749,
351
+ "eval_steps_per_second": 3.44,
352
+ "step": 315
353
+ },
354
+ {
355
+ "epoch": 14.222222222222221,
356
+ "grad_norm": 37.48932647705078,
357
+ "learning_rate": 0.0008787878787878789,
358
+ "loss": 1.2026,
359
+ "step": 320
360
+ },
361
+ {
362
+ "epoch": 14.666666666666666,
363
+ "grad_norm": 4.4914960861206055,
364
+ "learning_rate": 0.000875,
365
+ "loss": 1.2174,
366
+ "step": 330
367
+ },
368
+ {
369
+ "epoch": 14.977777777777778,
370
+ "eval_accuracy": 0.30434782608695654,
371
+ "eval_loss": 1.34762704372406,
372
+ "eval_runtime": 1.1749,
373
+ "eval_samples_per_second": 39.152,
374
+ "eval_steps_per_second": 2.553,
375
+ "step": 337
376
+ },
377
+ {
378
+ "epoch": 15.11111111111111,
379
+ "grad_norm": 3.3884503841400146,
380
+ "learning_rate": 0.0008712121212121212,
381
+ "loss": 1.1947,
382
+ "step": 340
383
+ },
384
+ {
385
+ "epoch": 15.555555555555555,
386
+ "grad_norm": 3.722648859024048,
387
+ "learning_rate": 0.0008674242424242425,
388
+ "loss": 1.2054,
389
+ "step": 350
390
+ },
391
+ {
392
+ "epoch": 16.0,
393
+ "grad_norm": 15.185981750488281,
394
+ "learning_rate": 0.0008636363636363636,
395
+ "loss": 1.1767,
396
+ "step": 360
397
+ },
398
+ {
399
+ "epoch": 16.0,
400
+ "eval_accuracy": 0.1956521739130435,
401
+ "eval_loss": 1.4023534059524536,
402
+ "eval_runtime": 0.8914,
403
+ "eval_samples_per_second": 51.601,
404
+ "eval_steps_per_second": 3.365,
405
+ "step": 360
406
+ },
407
+ {
408
+ "epoch": 16.444444444444443,
409
+ "grad_norm": 4.847630500793457,
410
+ "learning_rate": 0.0008598484848484849,
411
+ "loss": 1.1721,
412
+ "step": 370
413
+ },
414
+ {
415
+ "epoch": 16.88888888888889,
416
+ "grad_norm": 3.731781482696533,
417
+ "learning_rate": 0.0008560606060606061,
418
+ "loss": 1.2067,
419
+ "step": 380
420
+ },
421
+ {
422
+ "epoch": 16.977777777777778,
423
+ "eval_accuracy": 0.17391304347826086,
424
+ "eval_loss": 1.3664109706878662,
425
+ "eval_runtime": 0.8715,
426
+ "eval_samples_per_second": 52.783,
427
+ "eval_steps_per_second": 3.442,
428
+ "step": 382
429
+ },
430
+ {
431
+ "epoch": 17.333333333333332,
432
+ "grad_norm": 3.2619216442108154,
433
+ "learning_rate": 0.0008522727272727273,
434
+ "loss": 1.1866,
435
+ "step": 390
436
+ },
437
+ {
438
+ "epoch": 17.77777777777778,
439
+ "grad_norm": 11.475706100463867,
440
+ "learning_rate": 0.0008484848484848485,
441
+ "loss": 1.2303,
442
+ "step": 400
443
+ },
444
+ {
445
+ "epoch": 18.0,
446
+ "eval_accuracy": 0.2826086956521739,
447
+ "eval_loss": 1.4259557723999023,
448
+ "eval_runtime": 0.8714,
449
+ "eval_samples_per_second": 52.786,
450
+ "eval_steps_per_second": 3.443,
451
+ "step": 405
452
+ },
453
+ {
454
+ "epoch": 18.22222222222222,
455
+ "grad_norm": 18.98031997680664,
456
+ "learning_rate": 0.0008446969696969698,
457
+ "loss": 1.1939,
458
+ "step": 410
459
+ },
460
+ {
461
+ "epoch": 18.666666666666668,
462
+ "grad_norm": 2.759476900100708,
463
+ "learning_rate": 0.000840909090909091,
464
+ "loss": 1.222,
465
+ "step": 420
466
+ },
467
+ {
468
+ "epoch": 18.977777777777778,
469
+ "eval_accuracy": 0.17391304347826086,
470
+ "eval_loss": 1.480705976486206,
471
+ "eval_runtime": 1.1937,
472
+ "eval_samples_per_second": 38.535,
473
+ "eval_steps_per_second": 2.513,
474
+ "step": 427
475
+ },
476
+ {
477
+ "epoch": 19.11111111111111,
478
+ "grad_norm": 4.408325672149658,
479
+ "learning_rate": 0.0008371212121212122,
480
+ "loss": 1.1725,
481
+ "step": 430
482
+ },
483
+ {
484
+ "epoch": 19.555555555555557,
485
+ "grad_norm": 2.795828104019165,
486
+ "learning_rate": 0.0008333333333333334,
487
+ "loss": 1.1577,
488
+ "step": 440
489
+ },
490
+ {
491
+ "epoch": 20.0,
492
+ "grad_norm": 8.652217864990234,
493
+ "learning_rate": 0.0008295454545454546,
494
+ "loss": 1.2026,
495
+ "step": 450
496
+ },
497
+ {
498
+ "epoch": 20.0,
499
+ "eval_accuracy": 0.17391304347826086,
500
+ "eval_loss": 1.3851475715637207,
501
+ "eval_runtime": 0.8768,
502
+ "eval_samples_per_second": 52.465,
503
+ "eval_steps_per_second": 3.422,
504
+ "step": 450
505
+ },
506
+ {
507
+ "epoch": 20.444444444444443,
508
+ "grad_norm": 4.27825927734375,
509
+ "learning_rate": 0.0008257575757575758,
510
+ "loss": 1.1934,
511
+ "step": 460
512
+ },
513
+ {
514
+ "epoch": 20.88888888888889,
515
+ "grad_norm": 3.146332263946533,
516
+ "learning_rate": 0.000821969696969697,
517
+ "loss": 1.2185,
518
+ "step": 470
519
+ },
520
+ {
521
+ "epoch": 20.977777777777778,
522
+ "eval_accuracy": 0.2608695652173913,
523
+ "eval_loss": 1.3213552236557007,
524
+ "eval_runtime": 0.889,
525
+ "eval_samples_per_second": 51.741,
526
+ "eval_steps_per_second": 3.374,
527
+ "step": 472
528
+ },
529
+ {
530
+ "epoch": 21.333333333333332,
531
+ "grad_norm": 6.067198753356934,
532
+ "learning_rate": 0.0008181818181818183,
533
+ "loss": 1.2416,
534
+ "step": 480
535
+ },
536
+ {
537
+ "epoch": 21.77777777777778,
538
+ "grad_norm": 1.9041274785995483,
539
+ "learning_rate": 0.0008143939393939394,
540
+ "loss": 1.2773,
541
+ "step": 490
542
+ },
543
+ {
544
+ "epoch": 22.0,
545
+ "eval_accuracy": 0.1956521739130435,
546
+ "eval_loss": 1.4403716325759888,
547
+ "eval_runtime": 0.9305,
548
+ "eval_samples_per_second": 49.436,
549
+ "eval_steps_per_second": 3.224,
550
+ "step": 495
551
+ },
552
+ {
553
+ "epoch": 22.22222222222222,
554
+ "grad_norm": 3.5751218795776367,
555
+ "learning_rate": 0.0008106060606060606,
556
+ "loss": 1.2544,
557
+ "step": 500
558
+ },
559
+ {
560
+ "epoch": 22.666666666666668,
561
+ "grad_norm": 6.662022590637207,
562
+ "learning_rate": 0.0008068181818181818,
563
+ "loss": 1.227,
564
+ "step": 510
565
+ },
566
+ {
567
+ "epoch": 22.977777777777778,
568
+ "eval_accuracy": 0.2391304347826087,
569
+ "eval_loss": 1.453503966331482,
570
+ "eval_runtime": 1.2139,
571
+ "eval_samples_per_second": 37.894,
572
+ "eval_steps_per_second": 2.471,
573
+ "step": 517
574
+ },
575
+ {
576
+ "epoch": 23.11111111111111,
577
+ "grad_norm": 2.344249725341797,
578
+ "learning_rate": 0.000803030303030303,
579
+ "loss": 1.1912,
580
+ "step": 520
581
+ },
582
+ {
583
+ "epoch": 23.555555555555557,
584
+ "grad_norm": 5.093134880065918,
585
+ "learning_rate": 0.0007992424242424242,
586
+ "loss": 1.211,
587
+ "step": 530
588
+ },
589
+ {
590
+ "epoch": 24.0,
591
+ "grad_norm": 3.9864282608032227,
592
+ "learning_rate": 0.0007954545454545455,
593
+ "loss": 1.2032,
594
+ "step": 540
595
+ },
596
+ {
597
+ "epoch": 24.0,
598
+ "eval_accuracy": 0.30434782608695654,
599
+ "eval_loss": 1.3966683149337769,
600
+ "eval_runtime": 0.8732,
601
+ "eval_samples_per_second": 52.682,
602
+ "eval_steps_per_second": 3.436,
603
+ "step": 540
604
+ },
605
+ {
606
+ "epoch": 24.444444444444443,
607
+ "grad_norm": 5.3367438316345215,
608
+ "learning_rate": 0.0007916666666666666,
609
+ "loss": 1.2051,
610
+ "step": 550
611
+ },
612
+ {
613
+ "epoch": 24.88888888888889,
614
+ "grad_norm": 3.129544973373413,
615
+ "learning_rate": 0.0007878787878787878,
616
+ "loss": 1.2223,
617
+ "step": 560
618
+ },
619
+ {
620
+ "epoch": 24.977777777777778,
621
+ "eval_accuracy": 0.32608695652173914,
622
+ "eval_loss": 1.408994436264038,
623
+ "eval_runtime": 0.8542,
624
+ "eval_samples_per_second": 53.849,
625
+ "eval_steps_per_second": 3.512,
626
+ "step": 562
627
+ },
628
+ {
629
+ "epoch": 25.333333333333332,
630
+ "grad_norm": 2.691143751144409,
631
+ "learning_rate": 0.0007840909090909091,
632
+ "loss": 1.2041,
633
+ "step": 570
634
+ },
635
+ {
636
+ "epoch": 25.77777777777778,
637
+ "grad_norm": 9.64910888671875,
638
+ "learning_rate": 0.0007803030303030303,
639
+ "loss": 1.2527,
640
+ "step": 580
641
+ },
642
+ {
643
+ "epoch": 26.0,
644
+ "eval_accuracy": 0.2608695652173913,
645
+ "eval_loss": 1.4858431816101074,
646
+ "eval_runtime": 0.9084,
647
+ "eval_samples_per_second": 50.639,
648
+ "eval_steps_per_second": 3.303,
649
+ "step": 585
650
+ },
651
+ {
652
+ "epoch": 26.22222222222222,
653
+ "grad_norm": 2.8862998485565186,
654
+ "learning_rate": 0.0007765151515151515,
655
+ "loss": 1.1968,
656
+ "step": 590
657
+ },
658
+ {
659
+ "epoch": 26.666666666666668,
660
+ "grad_norm": 2.620485544204712,
661
+ "learning_rate": 0.0007727272727272727,
662
+ "loss": 1.2203,
663
+ "step": 600
664
+ },
665
+ {
666
+ "epoch": 26.977777777777778,
667
+ "eval_accuracy": 0.17391304347826086,
668
+ "eval_loss": 1.4366178512573242,
669
+ "eval_runtime": 0.888,
670
+ "eval_samples_per_second": 51.8,
671
+ "eval_steps_per_second": 3.378,
672
+ "step": 607
673
+ },
674
+ {
675
+ "epoch": 27.11111111111111,
676
+ "grad_norm": 1.672736406326294,
677
+ "learning_rate": 0.000768939393939394,
678
+ "loss": 1.182,
679
+ "step": 610
680
+ },
681
+ {
682
+ "epoch": 27.555555555555557,
683
+ "grad_norm": 4.047964096069336,
684
+ "learning_rate": 0.0007651515151515151,
685
+ "loss": 1.1824,
686
+ "step": 620
687
+ },
688
+ {
689
+ "epoch": 28.0,
690
+ "grad_norm": 7.370471477508545,
691
+ "learning_rate": 0.0007613636363636364,
692
+ "loss": 1.1993,
693
+ "step": 630
694
+ },
695
+ {
696
+ "epoch": 28.0,
697
+ "eval_accuracy": 0.2608695652173913,
698
+ "eval_loss": 1.4055887460708618,
699
+ "eval_runtime": 0.8703,
700
+ "eval_samples_per_second": 52.856,
701
+ "eval_steps_per_second": 3.447,
702
+ "step": 630
703
+ },
704
+ {
705
+ "epoch": 28.444444444444443,
706
+ "grad_norm": 2.3139352798461914,
707
+ "learning_rate": 0.0007575757575757576,
708
+ "loss": 1.1908,
709
+ "step": 640
710
+ },
711
+ {
712
+ "epoch": 28.88888888888889,
713
+ "grad_norm": 2.4274730682373047,
714
+ "learning_rate": 0.0007537878787878788,
715
+ "loss": 1.2014,
716
+ "step": 650
717
+ },
718
+ {
719
+ "epoch": 28.977777777777778,
720
+ "eval_accuracy": 0.30434782608695654,
721
+ "eval_loss": 1.3755403757095337,
722
+ "eval_runtime": 0.8706,
723
+ "eval_samples_per_second": 52.837,
724
+ "eval_steps_per_second": 3.446,
725
+ "step": 652
726
+ },
727
+ {
728
+ "epoch": 29.333333333333332,
729
+ "grad_norm": 2.012920618057251,
730
+ "learning_rate": 0.00075,
731
+ "loss": 1.2134,
732
+ "step": 660
733
+ },
734
+ {
735
+ "epoch": 29.77777777777778,
736
+ "grad_norm": 3.179375171661377,
737
+ "learning_rate": 0.0007462121212121212,
738
+ "loss": 1.2027,
739
+ "step": 670
740
+ },
741
+ {
742
+ "epoch": 30.0,
743
+ "eval_accuracy": 0.2608695652173913,
744
+ "eval_loss": 1.457945466041565,
745
+ "eval_runtime": 1.1362,
746
+ "eval_samples_per_second": 40.485,
747
+ "eval_steps_per_second": 2.64,
748
+ "step": 675
749
+ },
750
+ {
751
+ "epoch": 30.22222222222222,
752
+ "grad_norm": 3.6860008239746094,
753
+ "learning_rate": 0.0007424242424242425,
754
+ "loss": 1.2086,
755
+ "step": 680
756
+ },
757
+ {
758
+ "epoch": 30.666666666666668,
759
+ "grad_norm": 1.5310014486312866,
760
+ "learning_rate": 0.0007386363636363636,
761
+ "loss": 1.1961,
762
+ "step": 690
763
+ },
764
+ {
765
+ "epoch": 30.977777777777778,
766
+ "eval_accuracy": 0.2608695652173913,
767
+ "eval_loss": 1.4524133205413818,
768
+ "eval_runtime": 0.9718,
769
+ "eval_samples_per_second": 47.335,
770
+ "eval_steps_per_second": 3.087,
771
+ "step": 697
772
+ },
773
+ {
774
+ "epoch": 31.11111111111111,
775
+ "grad_norm": 24.489850997924805,
776
+ "learning_rate": 0.0007348484848484849,
777
+ "loss": 1.2059,
778
+ "step": 700
779
+ },
780
+ {
781
+ "epoch": 31.555555555555557,
782
+ "grad_norm": 2.539658308029175,
783
+ "learning_rate": 0.0007310606060606061,
784
+ "loss": 1.1874,
785
+ "step": 710
786
+ },
787
+ {
788
+ "epoch": 32.0,
789
+ "grad_norm": 2.7552974224090576,
790
+ "learning_rate": 0.0007272727272727273,
791
+ "loss": 1.1939,
792
+ "step": 720
793
+ },
794
+ {
795
+ "epoch": 32.0,
796
+ "eval_accuracy": 0.2391304347826087,
797
+ "eval_loss": 1.448819875717163,
798
+ "eval_runtime": 0.8862,
799
+ "eval_samples_per_second": 51.905,
800
+ "eval_steps_per_second": 3.385,
801
+ "step": 720
802
+ },
803
+ {
804
+ "epoch": 32.44444444444444,
805
+ "grad_norm": 3.2986652851104736,
806
+ "learning_rate": 0.0007234848484848485,
807
+ "loss": 1.2196,
808
+ "step": 730
809
+ },
810
+ {
811
+ "epoch": 32.888888888888886,
812
+ "grad_norm": 2.5736711025238037,
813
+ "learning_rate": 0.0007196969696969698,
814
+ "loss": 1.1889,
815
+ "step": 740
816
+ },
817
+ {
818
+ "epoch": 32.977777777777774,
819
+ "eval_accuracy": 0.15217391304347827,
820
+ "eval_loss": 1.456831693649292,
821
+ "eval_runtime": 0.8853,
822
+ "eval_samples_per_second": 51.958,
823
+ "eval_steps_per_second": 3.389,
824
+ "step": 742
825
+ },
826
+ {
827
+ "epoch": 33.333333333333336,
828
+ "grad_norm": 5.722994327545166,
829
+ "learning_rate": 0.0007159090909090909,
830
+ "loss": 1.178,
831
+ "step": 750
832
+ },
833
+ {
834
+ "epoch": 33.77777777777778,
835
+ "grad_norm": 1.9777653217315674,
836
+ "learning_rate": 0.0007121212121212122,
837
+ "loss": 1.1871,
838
+ "step": 760
839
+ },
840
+ {
841
+ "epoch": 34.0,
842
+ "eval_accuracy": 0.32608695652173914,
843
+ "eval_loss": 1.3814184665679932,
844
+ "eval_runtime": 0.8797,
845
+ "eval_samples_per_second": 52.292,
846
+ "eval_steps_per_second": 3.41,
847
+ "step": 765
848
+ },
849
+ {
850
+ "epoch": 34.22222222222222,
851
+ "grad_norm": 2.5516679286956787,
852
+ "learning_rate": 0.0007083333333333334,
853
+ "loss": 1.2329,
854
+ "step": 770
855
+ },
856
+ {
857
+ "epoch": 34.666666666666664,
858
+ "grad_norm": 3.3046047687530518,
859
+ "learning_rate": 0.0007045454545454546,
860
+ "loss": 1.1778,
861
+ "step": 780
862
+ },
863
+ {
864
+ "epoch": 34.977777777777774,
865
+ "eval_accuracy": 0.13043478260869565,
866
+ "eval_loss": 1.44027578830719,
867
+ "eval_runtime": 1.196,
868
+ "eval_samples_per_second": 38.461,
869
+ "eval_steps_per_second": 2.508,
870
+ "step": 787
871
+ },
872
+ {
873
+ "epoch": 35.111111111111114,
874
+ "grad_norm": 1.8858623504638672,
875
+ "learning_rate": 0.0007007575757575758,
876
+ "loss": 1.2006,
877
+ "step": 790
878
+ },
879
+ {
880
+ "epoch": 35.55555555555556,
881
+ "grad_norm": 5.07556676864624,
882
+ "learning_rate": 0.000696969696969697,
883
+ "loss": 1.2925,
884
+ "step": 800
885
+ },
886
+ {
887
+ "epoch": 36.0,
888
+ "grad_norm": 5.501034259796143,
889
+ "learning_rate": 0.0006931818181818183,
890
+ "loss": 1.2404,
891
+ "step": 810
892
+ },
893
+ {
894
+ "epoch": 36.0,
895
+ "eval_accuracy": 0.1956521739130435,
896
+ "eval_loss": 1.4436728954315186,
897
+ "eval_runtime": 0.9177,
898
+ "eval_samples_per_second": 50.125,
899
+ "eval_steps_per_second": 3.269,
900
+ "step": 810
901
+ },
902
+ {
903
+ "epoch": 36.44444444444444,
904
+ "grad_norm": 2.563979387283325,
905
+ "learning_rate": 0.0006893939393939394,
906
+ "loss": 1.222,
907
+ "step": 820
908
+ },
909
+ {
910
+ "epoch": 36.888888888888886,
911
+ "grad_norm": 2.562971591949463,
912
+ "learning_rate": 0.0006856060606060606,
913
+ "loss": 1.197,
914
+ "step": 830
915
+ },
916
+ {
917
+ "epoch": 36.977777777777774,
918
+ "eval_accuracy": 0.21739130434782608,
919
+ "eval_loss": 1.476518988609314,
920
+ "eval_runtime": 0.8797,
921
+ "eval_samples_per_second": 52.292,
922
+ "eval_steps_per_second": 3.41,
923
+ "step": 832
924
+ },
925
+ {
926
+ "epoch": 37.333333333333336,
927
+ "grad_norm": 4.266129016876221,
928
+ "learning_rate": 0.0006818181818181818,
929
+ "loss": 1.1733,
930
+ "step": 840
931
+ },
932
+ {
933
+ "epoch": 37.77777777777778,
934
+ "grad_norm": 2.0012102127075195,
935
+ "learning_rate": 0.000678030303030303,
936
+ "loss": 1.2161,
937
+ "step": 850
938
+ },
939
+ {
940
+ "epoch": 38.0,
941
+ "eval_accuracy": 0.2391304347826087,
942
+ "eval_loss": 1.3720359802246094,
943
+ "eval_runtime": 0.8896,
944
+ "eval_samples_per_second": 51.708,
945
+ "eval_steps_per_second": 3.372,
946
+ "step": 855
947
+ },
948
+ {
949
+ "epoch": 38.22222222222222,
950
+ "grad_norm": 5.82732629776001,
951
+ "learning_rate": 0.0006742424242424242,
952
+ "loss": 1.194,
953
+ "step": 860
954
+ },
955
+ {
956
+ "epoch": 38.666666666666664,
957
+ "grad_norm": 3.146446466445923,
958
+ "learning_rate": 0.0006704545454545455,
959
+ "loss": 1.221,
960
+ "step": 870
961
+ },
962
+ {
963
+ "epoch": 38.977777777777774,
964
+ "eval_accuracy": 0.34782608695652173,
965
+ "eval_loss": 1.3749516010284424,
966
+ "eval_runtime": 1.1844,
967
+ "eval_samples_per_second": 38.839,
968
+ "eval_steps_per_second": 2.533,
969
+ "step": 877
970
+ },
971
+ {
972
+ "epoch": 39.111111111111114,
973
+ "grad_norm": 2.4704267978668213,
974
+ "learning_rate": 0.0006666666666666666,
975
+ "loss": 1.2082,
976
+ "step": 880
977
+ },
978
+ {
979
+ "epoch": 39.55555555555556,
980
+ "grad_norm": 3.377894878387451,
981
+ "learning_rate": 0.0006628787878787878,
982
+ "loss": 1.2203,
983
+ "step": 890
984
+ },
985
+ {
986
+ "epoch": 40.0,
987
+ "grad_norm": 3.2576770782470703,
988
+ "learning_rate": 0.0006590909090909091,
989
+ "loss": 1.229,
990
+ "step": 900
991
+ },
992
+ {
993
+ "epoch": 40.0,
994
+ "eval_accuracy": 0.2391304347826087,
995
+ "eval_loss": 1.3404773473739624,
996
+ "eval_runtime": 0.876,
997
+ "eval_samples_per_second": 52.511,
998
+ "eval_steps_per_second": 3.425,
999
+ "step": 900
1000
+ },
1001
+ {
1002
+ "epoch": 40.44444444444444,
1003
+ "grad_norm": 2.0540192127227783,
1004
+ "learning_rate": 0.0006553030303030303,
1005
+ "loss": 1.2222,
1006
+ "step": 910
1007
+ },
1008
+ {
1009
+ "epoch": 40.888888888888886,
1010
+ "grad_norm": 2.336094617843628,
1011
+ "learning_rate": 0.0006515151515151515,
1012
+ "loss": 1.2046,
1013
+ "step": 920
1014
+ },
1015
+ {
1016
+ "epoch": 40.977777777777774,
1017
+ "eval_accuracy": 0.2608695652173913,
1018
+ "eval_loss": 1.4231040477752686,
1019
+ "eval_runtime": 0.899,
1020
+ "eval_samples_per_second": 51.168,
1021
+ "eval_steps_per_second": 3.337,
1022
+ "step": 922
1023
+ },
1024
+ {
1025
+ "epoch": 41.333333333333336,
1026
+ "grad_norm": 3.401913642883301,
1027
+ "learning_rate": 0.0006477272727272727,
1028
+ "loss": 1.2028,
1029
+ "step": 930
1030
+ },
1031
+ {
1032
+ "epoch": 41.77777777777778,
1033
+ "grad_norm": 1.3715537786483765,
1034
+ "learning_rate": 0.000643939393939394,
1035
+ "loss": 1.2077,
1036
+ "step": 940
1037
+ },
1038
+ {
1039
+ "epoch": 42.0,
1040
+ "eval_accuracy": 0.2391304347826087,
1041
+ "eval_loss": 1.4383732080459595,
1042
+ "eval_runtime": 1.0424,
1043
+ "eval_samples_per_second": 44.128,
1044
+ "eval_steps_per_second": 2.878,
1045
+ "step": 945
1046
+ },
1047
+ {
1048
+ "epoch": 42.22222222222222,
1049
+ "grad_norm": 3.548417806625366,
1050
+ "learning_rate": 0.0006401515151515151,
1051
+ "loss": 1.2126,
1052
+ "step": 950
1053
+ },
1054
+ {
1055
+ "epoch": 42.666666666666664,
1056
+ "grad_norm": 15.91876220703125,
1057
+ "learning_rate": 0.0006363636363636364,
1058
+ "loss": 1.1865,
1059
+ "step": 960
1060
+ },
1061
+ {
1062
+ "epoch": 42.977777777777774,
1063
+ "eval_accuracy": 0.2608695652173913,
1064
+ "eval_loss": 1.4346174001693726,
1065
+ "eval_runtime": 0.9529,
1066
+ "eval_samples_per_second": 48.275,
1067
+ "eval_steps_per_second": 3.148,
1068
+ "step": 967
1069
+ },
1070
+ {
1071
+ "epoch": 43.111111111111114,
1072
+ "grad_norm": 3.9566547870635986,
1073
+ "learning_rate": 0.0006325757575757576,
1074
+ "loss": 1.175,
1075
+ "step": 970
1076
+ },
1077
+ {
1078
+ "epoch": 43.55555555555556,
1079
+ "grad_norm": 2.891098976135254,
1080
+ "learning_rate": 0.0006287878787878788,
1081
+ "loss": 1.2044,
1082
+ "step": 980
1083
+ },
1084
+ {
1085
+ "epoch": 44.0,
1086
+ "grad_norm": 1.5360569953918457,
1087
+ "learning_rate": 0.000625,
1088
+ "loss": 1.1882,
1089
+ "step": 990
1090
+ },
1091
+ {
1092
+ "epoch": 44.0,
1093
+ "eval_accuracy": 0.2826086956521739,
1094
+ "eval_loss": 1.367881417274475,
1095
+ "eval_runtime": 0.8863,
1096
+ "eval_samples_per_second": 51.903,
1097
+ "eval_steps_per_second": 3.385,
1098
+ "step": 990
1099
+ },
1100
+ {
1101
+ "epoch": 44.44444444444444,
1102
+ "grad_norm": 2.5285489559173584,
1103
+ "learning_rate": 0.0006212121212121212,
1104
+ "loss": 1.1957,
1105
+ "step": 1000
1106
+ },
1107
+ {
1108
+ "epoch": 44.888888888888886,
1109
+ "grad_norm": 4.244635105133057,
1110
+ "learning_rate": 0.0006174242424242425,
1111
+ "loss": 1.2528,
1112
+ "step": 1010
1113
+ },
1114
+ {
1115
+ "epoch": 44.977777777777774,
1116
+ "eval_accuracy": 0.21739130434782608,
1117
+ "eval_loss": 1.3451467752456665,
1118
+ "eval_runtime": 0.8705,
1119
+ "eval_samples_per_second": 52.845,
1120
+ "eval_steps_per_second": 3.446,
1121
+ "step": 1012
1122
+ },
1123
+ {
1124
+ "epoch": 45.333333333333336,
1125
+ "grad_norm": 1.46609365940094,
1126
+ "learning_rate": 0.0006136363636363636,
1127
+ "loss": 1.2534,
1128
+ "step": 1020
1129
+ },
1130
+ {
1131
+ "epoch": 45.77777777777778,
1132
+ "grad_norm": 9.328937530517578,
1133
+ "learning_rate": 0.0006098484848484849,
1134
+ "loss": 1.1836,
1135
+ "step": 1030
1136
+ },
1137
+ {
1138
+ "epoch": 46.0,
1139
+ "eval_accuracy": 0.2391304347826087,
1140
+ "eval_loss": 1.4912604093551636,
1141
+ "eval_runtime": 0.8907,
1142
+ "eval_samples_per_second": 51.643,
1143
+ "eval_steps_per_second": 3.368,
1144
+ "step": 1035
1145
+ },
1146
+ {
1147
+ "epoch": 46.22222222222222,
1148
+ "grad_norm": 2.901005744934082,
1149
+ "learning_rate": 0.0006060606060606061,
1150
+ "loss": 1.2226,
1151
+ "step": 1040
1152
+ },
1153
+ {
1154
+ "epoch": 46.666666666666664,
1155
+ "grad_norm": 4.646663188934326,
1156
+ "learning_rate": 0.0006022727272727273,
1157
+ "loss": 1.2009,
1158
+ "step": 1050
1159
+ },
1160
+ {
1161
+ "epoch": 46.977777777777774,
1162
+ "eval_accuracy": 0.32608695652173914,
1163
+ "eval_loss": 1.4841315746307373,
1164
+ "eval_runtime": 1.1901,
1165
+ "eval_samples_per_second": 38.652,
1166
+ "eval_steps_per_second": 2.521,
1167
+ "step": 1057
1168
+ },
1169
+ {
1170
+ "epoch": 47.111111111111114,
1171
+ "grad_norm": 2.45768141746521,
1172
+ "learning_rate": 0.0005984848484848485,
1173
+ "loss": 1.219,
1174
+ "step": 1060
1175
+ },
1176
+ {
1177
+ "epoch": 47.55555555555556,
1178
+ "grad_norm": 5.484715938568115,
1179
+ "learning_rate": 0.0005946969696969698,
1180
+ "loss": 1.2043,
1181
+ "step": 1070
1182
+ },
1183
+ {
1184
+ "epoch": 48.0,
1185
+ "grad_norm": 62.28512954711914,
1186
+ "learning_rate": 0.0005909090909090909,
1187
+ "loss": 1.203,
1188
+ "step": 1080
1189
+ },
1190
+ {
1191
+ "epoch": 48.0,
1192
+ "eval_accuracy": 0.30434782608695654,
1193
+ "eval_loss": 1.4326200485229492,
1194
+ "eval_runtime": 0.8718,
1195
+ "eval_samples_per_second": 52.767,
1196
+ "eval_steps_per_second": 3.441,
1197
+ "step": 1080
1198
+ },
1199
+ {
1200
+ "epoch": 48.44444444444444,
1201
+ "grad_norm": 2.838622808456421,
1202
+ "learning_rate": 0.0005871212121212122,
1203
+ "loss": 1.2071,
1204
+ "step": 1090
1205
+ },
1206
+ {
1207
+ "epoch": 48.888888888888886,
1208
+ "grad_norm": 4.207474708557129,
1209
+ "learning_rate": 0.0005833333333333334,
1210
+ "loss": 1.1679,
1211
+ "step": 1100
1212
+ },
1213
+ {
1214
+ "epoch": 48.977777777777774,
1215
+ "eval_accuracy": 0.30434782608695654,
1216
+ "eval_loss": 1.3934518098831177,
1217
+ "eval_runtime": 0.857,
1218
+ "eval_samples_per_second": 53.673,
1219
+ "eval_steps_per_second": 3.5,
1220
+ "step": 1102
1221
+ },
1222
+ {
1223
+ "epoch": 49.333333333333336,
1224
+ "grad_norm": 2.942615270614624,
1225
+ "learning_rate": 0.0005795454545454545,
1226
+ "loss": 1.1719,
1227
+ "step": 1110
1228
+ },
1229
+ {
1230
+ "epoch": 49.77777777777778,
1231
+ "grad_norm": 54.94511032104492,
1232
+ "learning_rate": 0.0005757575757575758,
1233
+ "loss": 1.179,
1234
+ "step": 1120
1235
+ },
1236
+ {
1237
+ "epoch": 50.0,
1238
+ "eval_accuracy": 0.1956521739130435,
1239
+ "eval_loss": 1.4185277223587036,
1240
+ "eval_runtime": 0.8854,
1241
+ "eval_samples_per_second": 51.953,
1242
+ "eval_steps_per_second": 3.388,
1243
+ "step": 1125
1244
+ },
1245
+ {
1246
+ "epoch": 50.22222222222222,
1247
+ "grad_norm": 11.695096969604492,
1248
+ "learning_rate": 0.000571969696969697,
1249
+ "loss": 1.1624,
1250
+ "step": 1130
1251
+ },
1252
+ {
1253
+ "epoch": 50.666666666666664,
1254
+ "grad_norm": 1.823878288269043,
1255
+ "learning_rate": 0.0005681818181818183,
1256
+ "loss": 1.1687,
1257
+ "step": 1140
1258
+ },
1259
+ {
1260
+ "epoch": 50.977777777777774,
1261
+ "eval_accuracy": 0.2826086956521739,
1262
+ "eval_loss": 1.3686347007751465,
1263
+ "eval_runtime": 1.2456,
1264
+ "eval_samples_per_second": 36.929,
1265
+ "eval_steps_per_second": 2.408,
1266
+ "step": 1147
1267
+ },
1268
+ {
1269
+ "epoch": 51.111111111111114,
1270
+ "grad_norm": 17.766942977905273,
1271
+ "learning_rate": 0.0005643939393939394,
1272
+ "loss": 1.17,
1273
+ "step": 1150
1274
+ },
1275
+ {
1276
+ "epoch": 51.55555555555556,
1277
+ "grad_norm": 4.572807788848877,
1278
+ "learning_rate": 0.0005606060606060606,
1279
+ "loss": 1.1363,
1280
+ "step": 1160
1281
+ },
1282
+ {
1283
+ "epoch": 52.0,
1284
+ "grad_norm": 7.582568168640137,
1285
+ "learning_rate": 0.0005568181818181818,
1286
+ "loss": 1.1779,
1287
+ "step": 1170
1288
+ },
1289
+ {
1290
+ "epoch": 52.0,
1291
+ "eval_accuracy": 0.1956521739130435,
1292
+ "eval_loss": 1.4319127798080444,
1293
+ "eval_runtime": 0.895,
1294
+ "eval_samples_per_second": 51.396,
1295
+ "eval_steps_per_second": 3.352,
1296
+ "step": 1170
1297
+ },
1298
+ {
1299
+ "epoch": 52.44444444444444,
1300
+ "grad_norm": 2.6896812915802,
1301
+ "learning_rate": 0.000553030303030303,
1302
+ "loss": 1.1698,
1303
+ "step": 1180
1304
+ },
1305
+ {
1306
+ "epoch": 52.888888888888886,
1307
+ "grad_norm": 3.666240692138672,
1308
+ "learning_rate": 0.0005492424242424242,
1309
+ "loss": 1.1566,
1310
+ "step": 1190
1311
+ },
1312
+ {
1313
+ "epoch": 52.977777777777774,
1314
+ "eval_accuracy": 0.1956521739130435,
1315
+ "eval_loss": 1.3800519704818726,
1316
+ "eval_runtime": 0.9305,
1317
+ "eval_samples_per_second": 49.434,
1318
+ "eval_steps_per_second": 3.224,
1319
+ "step": 1192
1320
+ },
1321
+ {
1322
+ "epoch": 53.333333333333336,
1323
+ "grad_norm": 4.159182071685791,
1324
+ "learning_rate": 0.0005454545454545455,
1325
+ "loss": 1.1785,
1326
+ "step": 1200
1327
+ },
1328
+ {
1329
+ "epoch": 53.77777777777778,
1330
+ "grad_norm": 5.972134590148926,
1331
+ "learning_rate": 0.0005416666666666666,
1332
+ "loss": 1.192,
1333
+ "step": 1210
1334
+ },
1335
+ {
1336
+ "epoch": 54.0,
1337
+ "eval_accuracy": 0.21739130434782608,
1338
+ "eval_loss": 1.3745651245117188,
1339
+ "eval_runtime": 1.1774,
1340
+ "eval_samples_per_second": 39.07,
1341
+ "eval_steps_per_second": 2.548,
1342
+ "step": 1215
1343
+ },
1344
+ {
1345
+ "epoch": 54.22222222222222,
1346
+ "grad_norm": 5.2233428955078125,
1347
+ "learning_rate": 0.0005378787878787878,
1348
+ "loss": 1.1768,
1349
+ "step": 1220
1350
+ },
1351
+ {
1352
+ "epoch": 54.666666666666664,
1353
+ "grad_norm": 5.244997501373291,
1354
+ "learning_rate": 0.0005340909090909091,
1355
+ "loss": 1.1803,
1356
+ "step": 1230
1357
+ },
1358
+ {
1359
+ "epoch": 54.977777777777774,
1360
+ "eval_accuracy": 0.1956521739130435,
1361
+ "eval_loss": 1.4016964435577393,
1362
+ "eval_runtime": 0.8875,
1363
+ "eval_samples_per_second": 51.83,
1364
+ "eval_steps_per_second": 3.38,
1365
+ "step": 1237
1366
+ },
1367
+ {
1368
+ "epoch": 55.111111111111114,
1369
+ "grad_norm": 4.16229772567749,
1370
+ "learning_rate": 0.0005303030303030302,
1371
+ "loss": 1.1548,
1372
+ "step": 1240
1373
+ },
1374
+ {
1375
+ "epoch": 55.55555555555556,
1376
+ "grad_norm": 3.8485047817230225,
1377
+ "learning_rate": 0.0005265151515151515,
1378
+ "loss": 1.1629,
1379
+ "step": 1250
1380
+ },
1381
+ {
1382
+ "epoch": 56.0,
1383
+ "grad_norm": 3.398857593536377,
1384
+ "learning_rate": 0.0005227272727272727,
1385
+ "loss": 1.194,
1386
+ "step": 1260
1387
+ },
1388
+ {
1389
+ "epoch": 56.0,
1390
+ "eval_accuracy": 0.1956521739130435,
1391
+ "eval_loss": 1.4288326501846313,
1392
+ "eval_runtime": 0.8835,
1393
+ "eval_samples_per_second": 52.063,
1394
+ "eval_steps_per_second": 3.395,
1395
+ "step": 1260
1396
+ },
1397
+ {
1398
+ "epoch": 56.44444444444444,
1399
+ "grad_norm": 2.9012982845306396,
1400
+ "learning_rate": 0.000518939393939394,
1401
+ "loss": 1.1283,
1402
+ "step": 1270
1403
+ },
1404
+ {
1405
+ "epoch": 56.888888888888886,
1406
+ "grad_norm": 2.652462959289551,
1407
+ "learning_rate": 0.0005151515151515151,
1408
+ "loss": 1.1486,
1409
+ "step": 1280
1410
+ },
1411
+ {
1412
+ "epoch": 56.977777777777774,
1413
+ "eval_accuracy": 0.30434782608695654,
1414
+ "eval_loss": 1.392043113708496,
1415
+ "eval_runtime": 1.1872,
1416
+ "eval_samples_per_second": 38.747,
1417
+ "eval_steps_per_second": 2.527,
1418
+ "step": 1282
1419
+ },
1420
+ {
1421
+ "epoch": 57.333333333333336,
1422
+ "grad_norm": 16.2806453704834,
1423
+ "learning_rate": 0.0005113636363636364,
1424
+ "loss": 1.154,
1425
+ "step": 1290
1426
+ },
1427
+ {
1428
+ "epoch": 57.77777777777778,
1429
+ "grad_norm": 2.9445173740386963,
1430
+ "learning_rate": 0.0005075757575757576,
1431
+ "loss": 1.1429,
1432
+ "step": 1300
1433
+ },
1434
+ {
1435
+ "epoch": 58.0,
1436
+ "eval_accuracy": 0.2391304347826087,
1437
+ "eval_loss": 1.461561918258667,
1438
+ "eval_runtime": 0.8733,
1439
+ "eval_samples_per_second": 52.675,
1440
+ "eval_steps_per_second": 3.435,
1441
+ "step": 1305
1442
+ },
1443
+ {
1444
+ "epoch": 58.22222222222222,
1445
+ "grad_norm": 4.916953086853027,
1446
+ "learning_rate": 0.0005037878787878788,
1447
+ "loss": 1.1694,
1448
+ "step": 1310
1449
+ },
1450
+ {
1451
+ "epoch": 58.666666666666664,
1452
+ "grad_norm": 5.632236957550049,
1453
+ "learning_rate": 0.0005,
1454
+ "loss": 1.1655,
1455
+ "step": 1320
1456
+ },
1457
+ {
1458
+ "epoch": 58.977777777777774,
1459
+ "eval_accuracy": 0.21739130434782608,
1460
+ "eval_loss": 1.4119428396224976,
1461
+ "eval_runtime": 0.8592,
1462
+ "eval_samples_per_second": 53.535,
1463
+ "eval_steps_per_second": 3.491,
1464
+ "step": 1327
1465
+ },
1466
+ {
1467
+ "epoch": 59.111111111111114,
1468
+ "grad_norm": 10.658681869506836,
1469
+ "learning_rate": 0.0004962121212121212,
1470
+ "loss": 1.148,
1471
+ "step": 1330
1472
+ },
1473
+ {
1474
+ "epoch": 59.55555555555556,
1475
+ "grad_norm": 4.212299823760986,
1476
+ "learning_rate": 0.0004924242424242425,
1477
+ "loss": 1.1508,
1478
+ "step": 1340
1479
+ },
1480
+ {
1481
+ "epoch": 60.0,
1482
+ "grad_norm": 4.59652853012085,
1483
+ "learning_rate": 0.0004886363636363636,
1484
+ "loss": 1.1697,
1485
+ "step": 1350
1486
+ },
1487
+ {
1488
+ "epoch": 60.0,
1489
+ "eval_accuracy": 0.2608695652173913,
1490
+ "eval_loss": 1.3811644315719604,
1491
+ "eval_runtime": 0.9127,
1492
+ "eval_samples_per_second": 50.4,
1493
+ "eval_steps_per_second": 3.287,
1494
+ "step": 1350
1495
+ },
1496
+ {
1497
+ "epoch": 60.44444444444444,
1498
+ "grad_norm": 3.656782865524292,
1499
+ "learning_rate": 0.0004848484848484849,
1500
+ "loss": 1.1312,
1501
+ "step": 1360
1502
+ },
1503
+ {
1504
+ "epoch": 60.888888888888886,
1505
+ "grad_norm": 9.660019874572754,
1506
+ "learning_rate": 0.0004810606060606061,
1507
+ "loss": 1.1898,
1508
+ "step": 1370
1509
+ },
1510
+ {
1511
+ "epoch": 60.977777777777774,
1512
+ "eval_accuracy": 0.2391304347826087,
1513
+ "eval_loss": 1.4008588790893555,
1514
+ "eval_runtime": 1.1831,
1515
+ "eval_samples_per_second": 38.883,
1516
+ "eval_steps_per_second": 2.536,
1517
+ "step": 1372
1518
+ },
1519
+ {
1520
+ "epoch": 61.333333333333336,
1521
+ "grad_norm": 5.424683570861816,
1522
+ "learning_rate": 0.0004772727272727273,
1523
+ "loss": 1.2188,
1524
+ "step": 1380
1525
+ },
1526
+ {
1527
+ "epoch": 61.77777777777778,
1528
+ "grad_norm": 3.420642375946045,
1529
+ "learning_rate": 0.0004734848484848485,
1530
+ "loss": 1.1882,
1531
+ "step": 1390
1532
+ },
1533
+ {
1534
+ "epoch": 62.0,
1535
+ "eval_accuracy": 0.2391304347826087,
1536
+ "eval_loss": 1.422127604484558,
1537
+ "eval_runtime": 1.0461,
1538
+ "eval_samples_per_second": 43.972,
1539
+ "eval_steps_per_second": 2.868,
1540
+ "step": 1395
1541
+ },
1542
+ {
1543
+ "epoch": 62.22222222222222,
1544
+ "grad_norm": 37.69420623779297,
1545
+ "learning_rate": 0.0004696969696969697,
1546
+ "loss": 1.1428,
1547
+ "step": 1400
1548
+ },
1549
+ {
1550
+ "epoch": 62.666666666666664,
1551
+ "grad_norm": 6.073638439178467,
1552
+ "learning_rate": 0.0004659090909090909,
1553
+ "loss": 1.134,
1554
+ "step": 1410
1555
+ },
1556
+ {
1557
+ "epoch": 62.977777777777774,
1558
+ "eval_accuracy": 0.2608695652173913,
1559
+ "eval_loss": 1.618972897529602,
1560
+ "eval_runtime": 0.888,
1561
+ "eval_samples_per_second": 51.801,
1562
+ "eval_steps_per_second": 3.378,
1563
+ "step": 1417
1564
+ },
1565
+ {
1566
+ "epoch": 63.111111111111114,
1567
+ "grad_norm": 14.145001411437988,
1568
+ "learning_rate": 0.0004621212121212121,
1569
+ "loss": 1.177,
1570
+ "step": 1420
1571
+ },
1572
+ {
1573
+ "epoch": 63.55555555555556,
1574
+ "grad_norm": 3.415473222732544,
1575
+ "learning_rate": 0.0004583333333333333,
1576
+ "loss": 1.1739,
1577
+ "step": 1430
1578
+ },
1579
+ {
1580
+ "epoch": 64.0,
1581
+ "grad_norm": 9.372400283813477,
1582
+ "learning_rate": 0.00045454545454545455,
1583
+ "loss": 1.1748,
1584
+ "step": 1440
1585
+ },
1586
+ {
1587
+ "epoch": 64.0,
1588
+ "eval_accuracy": 0.2391304347826087,
1589
+ "eval_loss": 1.4336298704147339,
1590
+ "eval_runtime": 0.9015,
1591
+ "eval_samples_per_second": 51.025,
1592
+ "eval_steps_per_second": 3.328,
1593
+ "step": 1440
1594
+ },
1595
+ {
1596
+ "epoch": 64.44444444444444,
1597
+ "grad_norm": 3.4770920276641846,
1598
+ "learning_rate": 0.00045075757575757577,
1599
+ "loss": 1.1419,
1600
+ "step": 1450
1601
+ },
1602
+ {
1603
+ "epoch": 64.88888888888889,
1604
+ "grad_norm": 17.83055877685547,
1605
+ "learning_rate": 0.000446969696969697,
1606
+ "loss": 1.1439,
1607
+ "step": 1460
1608
+ },
1609
+ {
1610
+ "epoch": 64.97777777777777,
1611
+ "eval_accuracy": 0.1956521739130435,
1612
+ "eval_loss": 1.3744150400161743,
1613
+ "eval_runtime": 1.1623,
1614
+ "eval_samples_per_second": 39.576,
1615
+ "eval_steps_per_second": 2.581,
1616
+ "step": 1462
1617
+ },
1618
+ {
1619
+ "epoch": 65.33333333333333,
1620
+ "grad_norm": 5.167716026306152,
1621
+ "learning_rate": 0.0004431818181818182,
1622
+ "loss": 1.1155,
1623
+ "step": 1470
1624
+ },
1625
+ {
1626
+ "epoch": 65.77777777777777,
1627
+ "grad_norm": 2.334927558898926,
1628
+ "learning_rate": 0.0004393939393939394,
1629
+ "loss": 1.1585,
1630
+ "step": 1480
1631
+ },
1632
+ {
1633
+ "epoch": 66.0,
1634
+ "eval_accuracy": 0.3695652173913043,
1635
+ "eval_loss": 1.3992067575454712,
1636
+ "eval_runtime": 0.8747,
1637
+ "eval_samples_per_second": 52.591,
1638
+ "eval_steps_per_second": 3.43,
1639
+ "step": 1485
1640
+ },
1641
+ {
1642
+ "epoch": 66.22222222222223,
1643
+ "grad_norm": 4.668221473693848,
1644
+ "learning_rate": 0.0004356060606060606,
1645
+ "loss": 1.136,
1646
+ "step": 1490
1647
+ },
1648
+ {
1649
+ "epoch": 66.66666666666667,
1650
+ "grad_norm": 2.9979135990142822,
1651
+ "learning_rate": 0.0004318181818181818,
1652
+ "loss": 1.1344,
1653
+ "step": 1500
1654
+ },
1655
+ {
1656
+ "epoch": 66.97777777777777,
1657
+ "eval_accuracy": 0.2391304347826087,
1658
+ "eval_loss": 1.3951774835586548,
1659
+ "eval_runtime": 0.8935,
1660
+ "eval_samples_per_second": 51.481,
1661
+ "eval_steps_per_second": 3.357,
1662
+ "step": 1507
1663
+ },
1664
+ {
1665
+ "epoch": 67.11111111111111,
1666
+ "grad_norm": 2.9879891872406006,
1667
+ "learning_rate": 0.00042803030303030303,
1668
+ "loss": 1.1615,
1669
+ "step": 1510
1670
+ },
1671
+ {
1672
+ "epoch": 67.55555555555556,
1673
+ "grad_norm": 3.322258710861206,
1674
+ "learning_rate": 0.00042424242424242425,
1675
+ "loss": 1.1635,
1676
+ "step": 1520
1677
+ },
1678
+ {
1679
+ "epoch": 68.0,
1680
+ "grad_norm": 4.408764362335205,
1681
+ "learning_rate": 0.0004204545454545455,
1682
+ "loss": 1.1374,
1683
+ "step": 1530
1684
+ },
1685
+ {
1686
+ "epoch": 68.0,
1687
+ "eval_accuracy": 0.21739130434782608,
1688
+ "eval_loss": 1.3666102886199951,
1689
+ "eval_runtime": 0.8532,
1690
+ "eval_samples_per_second": 53.917,
1691
+ "eval_steps_per_second": 3.516,
1692
+ "step": 1530
1693
+ },
1694
+ {
1695
+ "epoch": 68.44444444444444,
1696
+ "grad_norm": 18.494497299194336,
1697
+ "learning_rate": 0.0004166666666666667,
1698
+ "loss": 1.126,
1699
+ "step": 1540
1700
+ },
1701
+ {
1702
+ "epoch": 68.88888888888889,
1703
+ "grad_norm": 6.762816905975342,
1704
+ "learning_rate": 0.0004128787878787879,
1705
+ "loss": 1.1252,
1706
+ "step": 1550
1707
+ },
1708
+ {
1709
+ "epoch": 68.97777777777777,
1710
+ "eval_accuracy": 0.2826086956521739,
1711
+ "eval_loss": 1.3704602718353271,
1712
+ "eval_runtime": 1.2029,
1713
+ "eval_samples_per_second": 38.24,
1714
+ "eval_steps_per_second": 2.494,
1715
+ "step": 1552
1716
+ },
1717
+ {
1718
+ "epoch": 69.33333333333333,
1719
+ "grad_norm": 4.585610389709473,
1720
+ "learning_rate": 0.00040909090909090913,
1721
+ "loss": 1.1272,
1722
+ "step": 1560
1723
+ },
1724
+ {
1725
+ "epoch": 69.77777777777777,
1726
+ "grad_norm": 10.724448204040527,
1727
+ "learning_rate": 0.0004053030303030303,
1728
+ "loss": 1.1339,
1729
+ "step": 1570
1730
+ },
1731
+ {
1732
+ "epoch": 70.0,
1733
+ "eval_accuracy": 0.2826086956521739,
1734
+ "eval_loss": 1.3982820510864258,
1735
+ "eval_runtime": 1.0724,
1736
+ "eval_samples_per_second": 42.893,
1737
+ "eval_steps_per_second": 2.797,
1738
+ "step": 1575
1739
+ },
1740
+ {
1741
+ "epoch": 70.22222222222223,
1742
+ "grad_norm": 5.506129741668701,
1743
+ "learning_rate": 0.0004015151515151515,
1744
+ "loss": 1.1491,
1745
+ "step": 1580
1746
+ },
1747
+ {
1748
+ "epoch": 70.66666666666667,
1749
+ "grad_norm": 17.69223976135254,
1750
+ "learning_rate": 0.00039772727272727274,
1751
+ "loss": 1.1344,
1752
+ "step": 1590
1753
+ },
1754
+ {
1755
+ "epoch": 70.97777777777777,
1756
+ "eval_accuracy": 0.30434782608695654,
1757
+ "eval_loss": 1.3792437314987183,
1758
+ "eval_runtime": 0.8913,
1759
+ "eval_samples_per_second": 51.609,
1760
+ "eval_steps_per_second": 3.366,
1761
+ "step": 1597
1762
+ },
1763
+ {
1764
+ "epoch": 71.11111111111111,
1765
+ "grad_norm": 10.686148643493652,
1766
+ "learning_rate": 0.0003939393939393939,
1767
+ "loss": 1.1495,
1768
+ "step": 1600
1769
+ },
1770
+ {
1771
+ "epoch": 71.55555555555556,
1772
+ "grad_norm": 2.353846549987793,
1773
+ "learning_rate": 0.0003901515151515151,
1774
+ "loss": 1.1566,
1775
+ "step": 1610
1776
+ },
1777
+ {
1778
+ "epoch": 72.0,
1779
+ "grad_norm": 3.250394821166992,
1780
+ "learning_rate": 0.00038636363636363635,
1781
+ "loss": 1.1343,
1782
+ "step": 1620
1783
+ },
1784
+ {
1785
+ "epoch": 72.0,
1786
+ "eval_accuracy": 0.2826086956521739,
1787
+ "eval_loss": 1.4466689825057983,
1788
+ "eval_runtime": 0.891,
1789
+ "eval_samples_per_second": 51.629,
1790
+ "eval_steps_per_second": 3.367,
1791
+ "step": 1620
1792
+ },
1793
+ {
1794
+ "epoch": 72.44444444444444,
1795
+ "grad_norm": 6.152634143829346,
1796
+ "learning_rate": 0.00038257575757575757,
1797
+ "loss": 1.1417,
1798
+ "step": 1630
1799
+ },
1800
+ {
1801
+ "epoch": 72.88888888888889,
1802
+ "grad_norm": 93.5090560913086,
1803
+ "learning_rate": 0.0003787878787878788,
1804
+ "loss": 1.1555,
1805
+ "step": 1640
1806
+ },
1807
+ {
1808
+ "epoch": 72.97777777777777,
1809
+ "eval_accuracy": 0.21739130434782608,
1810
+ "eval_loss": 1.4822701215744019,
1811
+ "eval_runtime": 1.1509,
1812
+ "eval_samples_per_second": 39.968,
1813
+ "eval_steps_per_second": 2.607,
1814
+ "step": 1642
1815
+ },
1816
+ {
1817
+ "epoch": 73.33333333333333,
1818
+ "grad_norm": 90.8385238647461,
1819
+ "learning_rate": 0.000375,
1820
+ "loss": 1.1227,
1821
+ "step": 1650
1822
+ },
1823
+ {
1824
+ "epoch": 73.77777777777777,
1825
+ "grad_norm": 3.00873064994812,
1826
+ "learning_rate": 0.00037121212121212123,
1827
+ "loss": 1.1329,
1828
+ "step": 1660
1829
+ },
1830
+ {
1831
+ "epoch": 74.0,
1832
+ "eval_accuracy": 0.15217391304347827,
1833
+ "eval_loss": 1.5136324167251587,
1834
+ "eval_runtime": 1.0615,
1835
+ "eval_samples_per_second": 43.334,
1836
+ "eval_steps_per_second": 2.826,
1837
+ "step": 1665
1838
+ },
1839
+ {
1840
+ "epoch": 74.22222222222223,
1841
+ "grad_norm": 9.816000938415527,
1842
+ "learning_rate": 0.00036742424242424245,
1843
+ "loss": 1.1719,
1844
+ "step": 1670
1845
+ },
1846
+ {
1847
+ "epoch": 74.66666666666667,
1848
+ "grad_norm": 2.1428558826446533,
1849
+ "learning_rate": 0.00036363636363636367,
1850
+ "loss": 1.1513,
1851
+ "step": 1680
1852
+ },
1853
+ {
1854
+ "epoch": 74.97777777777777,
1855
+ "eval_accuracy": 0.2391304347826087,
1856
+ "eval_loss": 1.479099988937378,
1857
+ "eval_runtime": 0.8852,
1858
+ "eval_samples_per_second": 51.967,
1859
+ "eval_steps_per_second": 3.389,
1860
+ "step": 1687
1861
+ },
1862
+ {
1863
+ "epoch": 75.11111111111111,
1864
+ "grad_norm": 4.258810997009277,
1865
+ "learning_rate": 0.0003598484848484849,
1866
+ "loss": 1.1449,
1867
+ "step": 1690
1868
+ },
1869
+ {
1870
+ "epoch": 75.55555555555556,
1871
+ "grad_norm": 3.7126126289367676,
1872
+ "learning_rate": 0.0003560606060606061,
1873
+ "loss": 1.1289,
1874
+ "step": 1700
1875
+ },
1876
+ {
1877
+ "epoch": 76.0,
1878
+ "grad_norm": 4.913658142089844,
1879
+ "learning_rate": 0.0003522727272727273,
1880
+ "loss": 1.1278,
1881
+ "step": 1710
1882
+ },
1883
+ {
1884
+ "epoch": 76.0,
1885
+ "eval_accuracy": 0.2608695652173913,
1886
+ "eval_loss": 1.4527482986450195,
1887
+ "eval_runtime": 0.8695,
1888
+ "eval_samples_per_second": 52.906,
1889
+ "eval_steps_per_second": 3.45,
1890
+ "step": 1710
1891
+ },
1892
+ {
1893
+ "epoch": 76.44444444444444,
1894
+ "grad_norm": 5.169142723083496,
1895
+ "learning_rate": 0.0003484848484848485,
1896
+ "loss": 1.1212,
1897
+ "step": 1720
1898
+ },
1899
+ {
1900
+ "epoch": 76.88888888888889,
1901
+ "grad_norm": 6.800394058227539,
1902
+ "learning_rate": 0.0003446969696969697,
1903
+ "loss": 1.0956,
1904
+ "step": 1730
1905
+ },
1906
+ {
1907
+ "epoch": 76.97777777777777,
1908
+ "eval_accuracy": 0.2391304347826087,
1909
+ "eval_loss": 1.4839533567428589,
1910
+ "eval_runtime": 0.9059,
1911
+ "eval_samples_per_second": 50.779,
1912
+ "eval_steps_per_second": 3.312,
1913
+ "step": 1732
1914
+ },
1915
+ {
1916
+ "epoch": 77.33333333333333,
1917
+ "grad_norm": 4.102919578552246,
1918
+ "learning_rate": 0.0003409090909090909,
1919
+ "loss": 1.1466,
1920
+ "step": 1740
1921
+ },
1922
+ {
1923
+ "epoch": 77.77777777777777,
1924
+ "grad_norm": 4.560825824737549,
1925
+ "learning_rate": 0.0003371212121212121,
1926
+ "loss": 1.1131,
1927
+ "step": 1750
1928
+ },
1929
+ {
1930
+ "epoch": 78.0,
1931
+ "eval_accuracy": 0.21739130434782608,
1932
+ "eval_loss": 1.4900346994400024,
1933
+ "eval_runtime": 1.1932,
1934
+ "eval_samples_per_second": 38.552,
1935
+ "eval_steps_per_second": 2.514,
1936
+ "step": 1755
1937
+ },
1938
+ {
1939
+ "epoch": 78.22222222222223,
1940
+ "grad_norm": 1.923434853553772,
1941
+ "learning_rate": 0.0003333333333333333,
1942
+ "loss": 1.1285,
1943
+ "step": 1760
1944
+ },
1945
+ {
1946
+ "epoch": 78.66666666666667,
1947
+ "grad_norm": 4.646895885467529,
1948
+ "learning_rate": 0.00032954545454545454,
1949
+ "loss": 1.1376,
1950
+ "step": 1770
1951
+ },
1952
+ {
1953
+ "epoch": 78.97777777777777,
1954
+ "eval_accuracy": 0.21739130434782608,
1955
+ "eval_loss": 1.5395020246505737,
1956
+ "eval_runtime": 0.8752,
1957
+ "eval_samples_per_second": 52.557,
1958
+ "eval_steps_per_second": 3.428,
1959
+ "step": 1777
1960
+ },
1961
+ {
1962
+ "epoch": 79.11111111111111,
1963
+ "grad_norm": 8.433492660522461,
1964
+ "learning_rate": 0.00032575757575757576,
1965
+ "loss": 1.1072,
1966
+ "step": 1780
1967
+ },
1968
+ {
1969
+ "epoch": 79.55555555555556,
1970
+ "grad_norm": 5.669383525848389,
1971
+ "learning_rate": 0.000321969696969697,
1972
+ "loss": 1.1135,
1973
+ "step": 1790
1974
+ },
1975
+ {
1976
+ "epoch": 80.0,
1977
+ "grad_norm": 18.24361801147461,
1978
+ "learning_rate": 0.0003181818181818182,
1979
+ "loss": 1.0883,
1980
+ "step": 1800
1981
+ },
1982
+ {
1983
+ "epoch": 80.0,
1984
+ "eval_accuracy": 0.1956521739130435,
1985
+ "eval_loss": 1.5037870407104492,
1986
+ "eval_runtime": 0.8647,
1987
+ "eval_samples_per_second": 53.198,
1988
+ "eval_steps_per_second": 3.469,
1989
+ "step": 1800
1990
+ },
1991
+ {
1992
+ "epoch": 80.44444444444444,
1993
+ "grad_norm": 8.68807315826416,
1994
+ "learning_rate": 0.0003143939393939394,
1995
+ "loss": 1.0899,
1996
+ "step": 1810
1997
+ },
1998
+ {
1999
+ "epoch": 80.88888888888889,
2000
+ "grad_norm": 39.473899841308594,
2001
+ "learning_rate": 0.0003106060606060606,
2002
+ "loss": 1.1017,
2003
+ "step": 1820
2004
+ },
2005
+ {
2006
+ "epoch": 80.97777777777777,
2007
+ "eval_accuracy": 0.1956521739130435,
2008
+ "eval_loss": 1.5392367839813232,
2009
+ "eval_runtime": 0.8702,
2010
+ "eval_samples_per_second": 52.864,
2011
+ "eval_steps_per_second": 3.448,
2012
+ "step": 1822
2013
+ },
2014
+ {
2015
+ "epoch": 81.33333333333333,
2016
+ "grad_norm": 3.8075191974639893,
2017
+ "learning_rate": 0.0003068181818181818,
2018
+ "loss": 1.0607,
2019
+ "step": 1830
2020
+ },
2021
+ {
2022
+ "epoch": 81.77777777777777,
2023
+ "grad_norm": 4.356723308563232,
2024
+ "learning_rate": 0.00030303030303030303,
2025
+ "loss": 1.1608,
2026
+ "step": 1840
2027
+ },
2028
+ {
2029
+ "epoch": 82.0,
2030
+ "eval_accuracy": 0.21739130434782608,
2031
+ "eval_loss": 1.4875361919403076,
2032
+ "eval_runtime": 1.2071,
2033
+ "eval_samples_per_second": 38.106,
2034
+ "eval_steps_per_second": 2.485,
2035
+ "step": 1845
2036
+ },
2037
+ {
2038
+ "epoch": 82.22222222222223,
2039
+ "grad_norm": 4.517791748046875,
2040
+ "learning_rate": 0.00029924242424242425,
2041
+ "loss": 1.1144,
2042
+ "step": 1850
2043
+ },
2044
+ {
2045
+ "epoch": 82.66666666666667,
2046
+ "grad_norm": 4.995427131652832,
2047
+ "learning_rate": 0.00029545454545454547,
2048
+ "loss": 1.1308,
2049
+ "step": 1860
2050
+ },
2051
+ {
2052
+ "epoch": 82.97777777777777,
2053
+ "eval_accuracy": 0.1956521739130435,
2054
+ "eval_loss": 1.5079646110534668,
2055
+ "eval_runtime": 0.8614,
2056
+ "eval_samples_per_second": 53.403,
2057
+ "eval_steps_per_second": 3.483,
2058
+ "step": 1867
2059
+ },
2060
+ {
2061
+ "epoch": 83.11111111111111,
2062
+ "grad_norm": 4.479354381561279,
2063
+ "learning_rate": 0.0002916666666666667,
2064
+ "loss": 1.0821,
2065
+ "step": 1870
2066
+ },
2067
+ {
2068
+ "epoch": 83.55555555555556,
2069
+ "grad_norm": 4.680623531341553,
2070
+ "learning_rate": 0.0002878787878787879,
2071
+ "loss": 1.0904,
2072
+ "step": 1880
2073
+ },
2074
+ {
2075
+ "epoch": 84.0,
2076
+ "grad_norm": 11.278671264648438,
2077
+ "learning_rate": 0.00028409090909090913,
2078
+ "loss": 1.1382,
2079
+ "step": 1890
2080
+ },
2081
+ {
2082
+ "epoch": 84.0,
2083
+ "eval_accuracy": 0.17391304347826086,
2084
+ "eval_loss": 1.4835433959960938,
2085
+ "eval_runtime": 0.8759,
2086
+ "eval_samples_per_second": 52.52,
2087
+ "eval_steps_per_second": 3.425,
2088
+ "step": 1890
2089
+ },
2090
+ {
2091
+ "epoch": 84.44444444444444,
2092
+ "grad_norm": 4.505593776702881,
2093
+ "learning_rate": 0.0002803030303030303,
2094
+ "loss": 1.0869,
2095
+ "step": 1900
2096
+ },
2097
+ {
2098
+ "epoch": 84.88888888888889,
2099
+ "grad_norm": 3.064387083053589,
2100
+ "learning_rate": 0.0002765151515151515,
2101
+ "loss": 1.1195,
2102
+ "step": 1910
2103
+ },
2104
+ {
2105
+ "epoch": 84.97777777777777,
2106
+ "eval_accuracy": 0.1956521739130435,
2107
+ "eval_loss": 1.4076049327850342,
2108
+ "eval_runtime": 0.865,
2109
+ "eval_samples_per_second": 53.179,
2110
+ "eval_steps_per_second": 3.468,
2111
+ "step": 1912
2112
+ },
2113
+ {
2114
+ "epoch": 85.33333333333333,
2115
+ "grad_norm": 4.745396137237549,
2116
+ "learning_rate": 0.00027272727272727274,
2117
+ "loss": 1.1153,
2118
+ "step": 1920
2119
+ },
2120
+ {
2121
+ "epoch": 85.77777777777777,
2122
+ "grad_norm": 14.576072692871094,
2123
+ "learning_rate": 0.0002689393939393939,
2124
+ "loss": 1.1149,
2125
+ "step": 1930
2126
+ },
2127
+ {
2128
+ "epoch": 86.0,
2129
+ "eval_accuracy": 0.17391304347826086,
2130
+ "eval_loss": 1.4840431213378906,
2131
+ "eval_runtime": 1.1314,
2132
+ "eval_samples_per_second": 40.656,
2133
+ "eval_steps_per_second": 2.651,
2134
+ "step": 1935
2135
+ },
2136
+ {
2137
+ "epoch": 86.22222222222223,
2138
+ "grad_norm": 4.840237617492676,
2139
+ "learning_rate": 0.0002651515151515151,
2140
+ "loss": 1.1314,
2141
+ "step": 1940
2142
+ },
2143
+ {
2144
+ "epoch": 86.66666666666667,
2145
+ "grad_norm": 7.4581756591796875,
2146
+ "learning_rate": 0.00026136363636363634,
2147
+ "loss": 1.1344,
2148
+ "step": 1950
2149
+ },
2150
+ {
2151
+ "epoch": 86.97777777777777,
2152
+ "eval_accuracy": 0.1956521739130435,
2153
+ "eval_loss": 1.473250150680542,
2154
+ "eval_runtime": 1.086,
2155
+ "eval_samples_per_second": 42.357,
2156
+ "eval_steps_per_second": 2.762,
2157
+ "step": 1957
2158
+ },
2159
+ {
2160
+ "epoch": 87.11111111111111,
2161
+ "grad_norm": 4.788184642791748,
2162
+ "learning_rate": 0.00025757575757575756,
2163
+ "loss": 1.0985,
2164
+ "step": 1960
2165
+ },
2166
+ {
2167
+ "epoch": 87.55555555555556,
2168
+ "grad_norm": 7.187996864318848,
2169
+ "learning_rate": 0.0002537878787878788,
2170
+ "loss": 1.1018,
2171
+ "step": 1970
2172
+ },
2173
+ {
2174
+ "epoch": 88.0,
2175
+ "grad_norm": 6.393486976623535,
2176
+ "learning_rate": 0.00025,
2177
+ "loss": 1.1268,
2178
+ "step": 1980
2179
+ },
2180
+ {
2181
+ "epoch": 88.0,
2182
+ "eval_accuracy": 0.2391304347826087,
2183
+ "eval_loss": 1.4446380138397217,
2184
+ "eval_runtime": 0.877,
2185
+ "eval_samples_per_second": 52.451,
2186
+ "eval_steps_per_second": 3.421,
2187
+ "step": 1980
2188
+ },
2189
+ {
2190
+ "epoch": 88.44444444444444,
2191
+ "grad_norm": 19.996049880981445,
2192
+ "learning_rate": 0.0002462121212121212,
2193
+ "loss": 1.115,
2194
+ "step": 1990
2195
+ },
2196
+ {
2197
+ "epoch": 88.88888888888889,
2198
+ "grad_norm": 11.436882019042969,
2199
+ "learning_rate": 0.00024242424242424245,
2200
+ "loss": 1.1267,
2201
+ "step": 2000
2202
+ },
2203
+ {
2204
+ "epoch": 88.97777777777777,
2205
+ "eval_accuracy": 0.21739130434782608,
2206
+ "eval_loss": 1.4359674453735352,
2207
+ "eval_runtime": 0.8812,
2208
+ "eval_samples_per_second": 52.199,
2209
+ "eval_steps_per_second": 3.404,
2210
+ "step": 2002
2211
+ },
2212
+ {
2213
+ "epoch": 89.33333333333333,
2214
+ "grad_norm": 7.150528430938721,
2215
+ "learning_rate": 0.00023863636363636364,
2216
+ "loss": 1.1335,
2217
+ "step": 2010
2218
+ },
2219
+ {
2220
+ "epoch": 89.77777777777777,
2221
+ "grad_norm": 142.4888153076172,
2222
+ "learning_rate": 0.00023484848484848486,
2223
+ "loss": 1.1034,
2224
+ "step": 2020
2225
+ },
2226
+ {
2227
+ "epoch": 90.0,
2228
+ "eval_accuracy": 0.15217391304347827,
2229
+ "eval_loss": 1.4328769445419312,
2230
+ "eval_runtime": 0.8938,
2231
+ "eval_samples_per_second": 51.465,
2232
+ "eval_steps_per_second": 3.356,
2233
+ "step": 2025
2234
+ },
2235
+ {
2236
+ "epoch": 90.22222222222223,
2237
+ "grad_norm": 13.855992317199707,
2238
+ "learning_rate": 0.00023106060606060605,
2239
+ "loss": 1.0987,
2240
+ "step": 2030
2241
+ },
2242
+ {
2243
+ "epoch": 90.66666666666667,
2244
+ "grad_norm": 4.523609638214111,
2245
+ "learning_rate": 0.00022727272727272727,
2246
+ "loss": 1.1113,
2247
+ "step": 2040
2248
+ },
2249
+ {
2250
+ "epoch": 90.97777777777777,
2251
+ "eval_accuracy": 0.17391304347826086,
2252
+ "eval_loss": 1.4670028686523438,
2253
+ "eval_runtime": 1.1907,
2254
+ "eval_samples_per_second": 38.631,
2255
+ "eval_steps_per_second": 2.519,
2256
+ "step": 2047
2257
+ },
2258
+ {
2259
+ "epoch": 91.11111111111111,
2260
+ "grad_norm": 8.413890838623047,
2261
+ "learning_rate": 0.0002234848484848485,
2262
+ "loss": 1.0848,
2263
+ "step": 2050
2264
+ },
2265
+ {
2266
+ "epoch": 91.55555555555556,
2267
+ "grad_norm": 2.8552653789520264,
2268
+ "learning_rate": 0.0002196969696969697,
2269
+ "loss": 1.0788,
2270
+ "step": 2060
2271
+ },
2272
+ {
2273
+ "epoch": 92.0,
2274
+ "grad_norm": 22.520193099975586,
2275
+ "learning_rate": 0.0002159090909090909,
2276
+ "loss": 1.0957,
2277
+ "step": 2070
2278
+ },
2279
+ {
2280
+ "epoch": 92.0,
2281
+ "eval_accuracy": 0.2391304347826087,
2282
+ "eval_loss": 1.4802157878875732,
2283
+ "eval_runtime": 0.8728,
2284
+ "eval_samples_per_second": 52.706,
2285
+ "eval_steps_per_second": 3.437,
2286
+ "step": 2070
2287
+ },
2288
+ {
2289
+ "epoch": 92.44444444444444,
2290
+ "grad_norm": 17.830493927001953,
2291
+ "learning_rate": 0.00021212121212121213,
2292
+ "loss": 1.122,
2293
+ "step": 2080
2294
+ },
2295
+ {
2296
+ "epoch": 92.88888888888889,
2297
+ "grad_norm": 4.072429656982422,
2298
+ "learning_rate": 0.00020833333333333335,
2299
+ "loss": 1.1227,
2300
+ "step": 2090
2301
+ },
2302
+ {
2303
+ "epoch": 92.97777777777777,
2304
+ "eval_accuracy": 0.17391304347826086,
2305
+ "eval_loss": 1.4715131521224976,
2306
+ "eval_runtime": 0.8786,
2307
+ "eval_samples_per_second": 52.357,
2308
+ "eval_steps_per_second": 3.415,
2309
+ "step": 2092
2310
+ },
2311
+ {
2312
+ "epoch": 93.33333333333333,
2313
+ "grad_norm": 4.416510581970215,
2314
+ "learning_rate": 0.00020454545454545457,
2315
+ "loss": 1.0755,
2316
+ "step": 2100
2317
+ },
2318
+ {
2319
+ "epoch": 93.77777777777777,
2320
+ "grad_norm": 19.229928970336914,
2321
+ "learning_rate": 0.00020075757575757576,
2322
+ "loss": 1.1083,
2323
+ "step": 2110
2324
+ },
2325
+ {
2326
+ "epoch": 94.0,
2327
+ "eval_accuracy": 0.1956521739130435,
2328
+ "eval_loss": 1.4812626838684082,
2329
+ "eval_runtime": 0.8656,
2330
+ "eval_samples_per_second": 53.14,
2331
+ "eval_steps_per_second": 3.466,
2332
+ "step": 2115
2333
+ },
2334
+ {
2335
+ "epoch": 94.22222222222223,
2336
+ "grad_norm": 4.718553066253662,
2337
+ "learning_rate": 0.00019696969696969695,
2338
+ "loss": 1.0757,
2339
+ "step": 2120
2340
+ },
2341
+ {
2342
+ "epoch": 94.66666666666667,
2343
+ "grad_norm": 5.6447601318359375,
2344
+ "learning_rate": 0.00019318181818181817,
2345
+ "loss": 1.0583,
2346
+ "step": 2130
2347
+ },
2348
+ {
2349
+ "epoch": 94.97777777777777,
2350
+ "eval_accuracy": 0.1956521739130435,
2351
+ "eval_loss": 1.520257830619812,
2352
+ "eval_runtime": 1.1597,
2353
+ "eval_samples_per_second": 39.665,
2354
+ "eval_steps_per_second": 2.587,
2355
+ "step": 2137
2356
+ },
2357
+ {
2358
+ "epoch": 95.11111111111111,
2359
+ "grad_norm": 5.209888935089111,
2360
+ "learning_rate": 0.0001893939393939394,
2361
+ "loss": 1.098,
2362
+ "step": 2140
2363
+ },
2364
+ {
2365
+ "epoch": 95.55555555555556,
2366
+ "grad_norm": 7.493114471435547,
2367
+ "learning_rate": 0.00018560606060606061,
2368
+ "loss": 1.0796,
2369
+ "step": 2150
2370
+ },
2371
+ {
2372
+ "epoch": 96.0,
2373
+ "grad_norm": 11.232746124267578,
2374
+ "learning_rate": 0.00018181818181818183,
2375
+ "loss": 1.093,
2376
+ "step": 2160
2377
+ },
2378
+ {
2379
+ "epoch": 96.0,
2380
+ "eval_accuracy": 0.17391304347826086,
2381
+ "eval_loss": 1.5394465923309326,
2382
+ "eval_runtime": 0.8817,
2383
+ "eval_samples_per_second": 52.171,
2384
+ "eval_steps_per_second": 3.402,
2385
+ "step": 2160
2386
+ },
2387
+ {
2388
+ "epoch": 96.44444444444444,
2389
+ "grad_norm": 9.968954086303711,
2390
+ "learning_rate": 0.00017803030303030305,
2391
+ "loss": 1.0958,
2392
+ "step": 2170
2393
+ },
2394
+ {
2395
+ "epoch": 96.88888888888889,
2396
+ "grad_norm": 15.155268669128418,
2397
+ "learning_rate": 0.00017424242424242425,
2398
+ "loss": 1.0809,
2399
+ "step": 2180
2400
+ },
2401
+ {
2402
+ "epoch": 96.97777777777777,
2403
+ "eval_accuracy": 0.17391304347826086,
2404
+ "eval_loss": 1.4620193243026733,
2405
+ "eval_runtime": 0.8656,
2406
+ "eval_samples_per_second": 53.142,
2407
+ "eval_steps_per_second": 3.466,
2408
+ "step": 2182
2409
+ },
2410
+ {
2411
+ "epoch": 97.33333333333333,
2412
+ "grad_norm": 10.245019912719727,
2413
+ "learning_rate": 0.00017045454545454544,
2414
+ "loss": 1.0822,
2415
+ "step": 2190
2416
+ },
2417
+ {
2418
+ "epoch": 97.77777777777777,
2419
+ "grad_norm": 56.624778747558594,
2420
+ "learning_rate": 0.00016666666666666666,
2421
+ "loss": 1.0888,
2422
+ "step": 2200
2423
+ },
2424
+ {
2425
+ "epoch": 98.0,
2426
+ "eval_accuracy": 0.17391304347826086,
2427
+ "eval_loss": 1.4407347440719604,
2428
+ "eval_runtime": 0.8737,
2429
+ "eval_samples_per_second": 52.651,
2430
+ "eval_steps_per_second": 3.434,
2431
+ "step": 2205
2432
+ },
2433
+ {
2434
+ "epoch": 98.22222222222223,
2435
+ "grad_norm": 16.125377655029297,
2436
+ "learning_rate": 0.00016287878787878788,
2437
+ "loss": 1.0803,
2438
+ "step": 2210
2439
+ },
2440
+ {
2441
+ "epoch": 98.66666666666667,
2442
+ "grad_norm": 4.7502546310424805,
2443
+ "learning_rate": 0.0001590909090909091,
2444
+ "loss": 1.1292,
2445
+ "step": 2220
2446
+ },
2447
+ {
2448
+ "epoch": 98.97777777777777,
2449
+ "eval_accuracy": 0.1956521739130435,
2450
+ "eval_loss": 1.4577943086624146,
2451
+ "eval_runtime": 1.2602,
2452
+ "eval_samples_per_second": 36.503,
2453
+ "eval_steps_per_second": 2.381,
2454
+ "step": 2227
2455
+ },
2456
+ {
2457
+ "epoch": 99.11111111111111,
2458
+ "grad_norm": 12.011200904846191,
2459
+ "learning_rate": 0.0001553030303030303,
2460
+ "loss": 1.076,
2461
+ "step": 2230
2462
+ },
2463
+ {
2464
+ "epoch": 99.55555555555556,
2465
+ "grad_norm": 10.061297416687012,
2466
+ "learning_rate": 0.00015151515151515152,
2467
+ "loss": 1.0789,
2468
+ "step": 2240
2469
+ },
2470
+ {
2471
+ "epoch": 100.0,
2472
+ "grad_norm": 7.683816432952881,
2473
+ "learning_rate": 0.00014772727272727274,
2474
+ "loss": 1.0754,
2475
+ "step": 2250
2476
+ },
2477
+ {
2478
+ "epoch": 100.0,
2479
+ "eval_accuracy": 0.17391304347826086,
2480
+ "eval_loss": 1.5031030178070068,
2481
+ "eval_runtime": 0.8791,
2482
+ "eval_samples_per_second": 52.328,
2483
+ "eval_steps_per_second": 3.413,
2484
+ "step": 2250
2485
+ },
2486
+ {
2487
+ "epoch": 100.44444444444444,
2488
+ "grad_norm": 10.789087295532227,
2489
+ "learning_rate": 0.00014393939393939396,
2490
+ "loss": 1.0662,
2491
+ "step": 2260
2492
+ },
2493
+ {
2494
+ "epoch": 100.88888888888889,
2495
+ "grad_norm": 14.93548583984375,
2496
+ "learning_rate": 0.00014015151515151515,
2497
+ "loss": 1.0817,
2498
+ "step": 2270
2499
+ },
2500
+ {
2501
+ "epoch": 100.97777777777777,
2502
+ "eval_accuracy": 0.21739130434782608,
2503
+ "eval_loss": 1.4460813999176025,
2504
+ "eval_runtime": 0.8764,
2505
+ "eval_samples_per_second": 52.489,
2506
+ "eval_steps_per_second": 3.423,
2507
+ "step": 2272
2508
+ },
2509
+ {
2510
+ "epoch": 101.33333333333333,
2511
+ "grad_norm": 7.292104244232178,
2512
+ "learning_rate": 0.00013636363636363637,
2513
+ "loss": 1.1087,
2514
+ "step": 2280
2515
+ },
2516
+ {
2517
+ "epoch": 101.77777777777777,
2518
+ "grad_norm": 21.396413803100586,
2519
+ "learning_rate": 0.00013257575757575756,
2520
+ "loss": 1.0671,
2521
+ "step": 2290
2522
+ },
2523
+ {
2524
+ "epoch": 102.0,
2525
+ "eval_accuracy": 0.2391304347826087,
2526
+ "eval_loss": 1.4722661972045898,
2527
+ "eval_runtime": 0.8883,
2528
+ "eval_samples_per_second": 51.786,
2529
+ "eval_steps_per_second": 3.377,
2530
+ "step": 2295
2531
+ },
2532
+ {
2533
+ "epoch": 102.22222222222223,
2534
+ "grad_norm": 7.56445837020874,
2535
+ "learning_rate": 0.00012878787878787878,
2536
+ "loss": 1.0837,
2537
+ "step": 2300
2538
+ },
2539
+ {
2540
+ "epoch": 102.66666666666667,
2541
+ "grad_norm": 12.246611595153809,
2542
+ "learning_rate": 0.000125,
2543
+ "loss": 1.0815,
2544
+ "step": 2310
2545
+ },
2546
+ {
2547
+ "epoch": 102.97777777777777,
2548
+ "eval_accuracy": 0.1956521739130435,
2549
+ "eval_loss": 1.4988662004470825,
2550
+ "eval_runtime": 1.1771,
2551
+ "eval_samples_per_second": 39.079,
2552
+ "eval_steps_per_second": 2.549,
2553
+ "step": 2317
2554
+ },
2555
+ {
2556
+ "epoch": 103.11111111111111,
2557
+ "grad_norm": 10.69598388671875,
2558
+ "learning_rate": 0.00012121212121212122,
2559
+ "loss": 1.0852,
2560
+ "step": 2320
2561
+ },
2562
+ {
2563
+ "epoch": 103.55555555555556,
2564
+ "grad_norm": 9.647980690002441,
2565
+ "learning_rate": 0.00011742424242424243,
2566
+ "loss": 1.076,
2567
+ "step": 2330
2568
+ },
2569
+ {
2570
+ "epoch": 104.0,
2571
+ "grad_norm": 38.92795944213867,
2572
+ "learning_rate": 0.00011363636363636364,
2573
+ "loss": 1.0967,
2574
+ "step": 2340
2575
+ },
2576
+ {
2577
+ "epoch": 104.0,
2578
+ "eval_accuracy": 0.21739130434782608,
2579
+ "eval_loss": 1.465432047843933,
2580
+ "eval_runtime": 0.8692,
2581
+ "eval_samples_per_second": 52.922,
2582
+ "eval_steps_per_second": 3.451,
2583
+ "step": 2340
2584
+ },
2585
+ {
2586
+ "epoch": 104.44444444444444,
2587
+ "grad_norm": 9.53869915008545,
2588
+ "learning_rate": 0.00010984848484848486,
2589
+ "loss": 1.0838,
2590
+ "step": 2350
2591
+ },
2592
+ {
2593
+ "epoch": 104.88888888888889,
2594
+ "grad_norm": 5.269750118255615,
2595
+ "learning_rate": 0.00010606060606060606,
2596
+ "loss": 1.091,
2597
+ "step": 2360
2598
+ },
2599
+ {
2600
+ "epoch": 104.97777777777777,
2601
+ "eval_accuracy": 0.21739130434782608,
2602
+ "eval_loss": 1.4559190273284912,
2603
+ "eval_runtime": 0.8756,
2604
+ "eval_samples_per_second": 52.535,
2605
+ "eval_steps_per_second": 3.426,
2606
+ "step": 2362
2607
+ },
2608
+ {
2609
+ "epoch": 105.33333333333333,
2610
+ "grad_norm": 12.788183212280273,
2611
+ "learning_rate": 0.00010227272727272728,
2612
+ "loss": 1.1085,
2613
+ "step": 2370
2614
+ },
2615
+ {
2616
+ "epoch": 105.77777777777777,
2617
+ "grad_norm": 7.0092058181762695,
2618
+ "learning_rate": 9.848484848484848e-05,
2619
+ "loss": 1.0895,
2620
+ "step": 2380
2621
+ },
2622
+ {
2623
+ "epoch": 106.0,
2624
+ "eval_accuracy": 0.2826086956521739,
2625
+ "eval_loss": 1.4221450090408325,
2626
+ "eval_runtime": 1.1631,
2627
+ "eval_samples_per_second": 39.549,
2628
+ "eval_steps_per_second": 2.579,
2629
+ "step": 2385
2630
+ },
2631
+ {
2632
+ "epoch": 106.22222222222223,
2633
+ "grad_norm": 12.933279037475586,
2634
+ "learning_rate": 9.46969696969697e-05,
2635
+ "loss": 1.1548,
2636
+ "step": 2390
2637
+ },
2638
+ {
2639
+ "epoch": 106.66666666666667,
2640
+ "grad_norm": 4.65310001373291,
2641
+ "learning_rate": 9.090909090909092e-05,
2642
+ "loss": 1.0847,
2643
+ "step": 2400
2644
+ },
2645
+ {
2646
+ "epoch": 106.97777777777777,
2647
+ "eval_accuracy": 0.2826086956521739,
2648
+ "eval_loss": 1.4292521476745605,
2649
+ "eval_runtime": 0.8992,
2650
+ "eval_samples_per_second": 51.156,
2651
+ "eval_steps_per_second": 3.336,
2652
+ "step": 2407
2653
+ },
2654
+ {
2655
+ "epoch": 107.11111111111111,
2656
+ "grad_norm": 7.354497909545898,
2657
+ "learning_rate": 8.712121212121212e-05,
2658
+ "loss": 1.0996,
2659
+ "step": 2410
2660
+ },
2661
+ {
2662
+ "epoch": 107.55555555555556,
2663
+ "grad_norm": 5.702237129211426,
2664
+ "learning_rate": 8.333333333333333e-05,
2665
+ "loss": 1.0883,
2666
+ "step": 2420
2667
+ },
2668
+ {
2669
+ "epoch": 108.0,
2670
+ "grad_norm": 4.873330116271973,
2671
+ "learning_rate": 7.954545454545455e-05,
2672
+ "loss": 1.102,
2673
+ "step": 2430
2674
+ },
2675
+ {
2676
+ "epoch": 108.0,
2677
+ "eval_accuracy": 0.2391304347826087,
2678
+ "eval_loss": 1.4582384824752808,
2679
+ "eval_runtime": 0.9053,
2680
+ "eval_samples_per_second": 50.81,
2681
+ "eval_steps_per_second": 3.314,
2682
+ "step": 2430
2683
+ },
2684
+ {
2685
+ "epoch": 108.44444444444444,
2686
+ "grad_norm": 78.408447265625,
2687
+ "learning_rate": 7.575757575757576e-05,
2688
+ "loss": 1.1048,
2689
+ "step": 2440
2690
+ },
2691
+ {
2692
+ "epoch": 108.88888888888889,
2693
+ "grad_norm": 6.651626110076904,
2694
+ "learning_rate": 7.196969696969698e-05,
2695
+ "loss": 1.0404,
2696
+ "step": 2450
2697
+ },
2698
+ {
2699
+ "epoch": 108.97777777777777,
2700
+ "eval_accuracy": 0.21739130434782608,
2701
+ "eval_loss": 1.4655812978744507,
2702
+ "eval_runtime": 0.8722,
2703
+ "eval_samples_per_second": 52.741,
2704
+ "eval_steps_per_second": 3.44,
2705
+ "step": 2452
2706
+ },
2707
+ {
2708
+ "epoch": 109.33333333333333,
2709
+ "grad_norm": 10.666740417480469,
2710
+ "learning_rate": 6.818181818181818e-05,
2711
+ "loss": 1.0799,
2712
+ "step": 2460
2713
+ },
2714
+ {
2715
+ "epoch": 109.77777777777777,
2716
+ "grad_norm": 6.093382358551025,
2717
+ "learning_rate": 6.439393939393939e-05,
2718
+ "loss": 1.0488,
2719
+ "step": 2470
2720
+ },
2721
+ {
2722
+ "epoch": 110.0,
2723
+ "eval_accuracy": 0.21739130434782608,
2724
+ "eval_loss": 1.489004373550415,
2725
+ "eval_runtime": 1.0926,
2726
+ "eval_samples_per_second": 42.101,
2727
+ "eval_steps_per_second": 2.746,
2728
+ "step": 2475
2729
+ },
2730
+ {
2731
+ "epoch": 110.22222222222223,
2732
+ "grad_norm": 5.012928009033203,
2733
+ "learning_rate": 6.060606060606061e-05,
2734
+ "loss": 1.1006,
2735
+ "step": 2480
2736
+ },
2737
+ {
2738
+ "epoch": 110.66666666666667,
2739
+ "grad_norm": 8.174079895019531,
2740
+ "learning_rate": 5.681818181818182e-05,
2741
+ "loss": 1.0966,
2742
+ "step": 2490
2743
+ },
2744
+ {
2745
+ "epoch": 110.97777777777777,
2746
+ "eval_accuracy": 0.21739130434782608,
2747
+ "eval_loss": 1.4631787538528442,
2748
+ "eval_runtime": 0.8917,
2749
+ "eval_samples_per_second": 51.586,
2750
+ "eval_steps_per_second": 3.364,
2751
+ "step": 2497
2752
+ },
2753
+ {
2754
+ "epoch": 111.11111111111111,
2755
+ "grad_norm": 4.650646686553955,
2756
+ "learning_rate": 5.303030303030303e-05,
2757
+ "loss": 1.0629,
2758
+ "step": 2500
2759
+ },
2760
+ {
2761
+ "epoch": 111.55555555555556,
2762
+ "grad_norm": 5.132219314575195,
2763
+ "learning_rate": 4.924242424242424e-05,
2764
+ "loss": 1.0835,
2765
+ "step": 2510
2766
+ },
2767
+ {
2768
+ "epoch": 112.0,
2769
+ "grad_norm": 8.792770385742188,
2770
+ "learning_rate": 4.545454545454546e-05,
2771
+ "loss": 1.0901,
2772
+ "step": 2520
2773
+ },
2774
+ {
2775
+ "epoch": 112.0,
2776
+ "eval_accuracy": 0.21739130434782608,
2777
+ "eval_loss": 1.4494850635528564,
2778
+ "eval_runtime": 0.872,
2779
+ "eval_samples_per_second": 52.75,
2780
+ "eval_steps_per_second": 3.44,
2781
+ "step": 2520
2782
+ },
2783
+ {
2784
+ "epoch": 112.44444444444444,
2785
+ "grad_norm": 6.029526710510254,
2786
+ "learning_rate": 4.1666666666666665e-05,
2787
+ "loss": 1.0764,
2788
+ "step": 2530
2789
+ },
2790
+ {
2791
+ "epoch": 112.88888888888889,
2792
+ "grad_norm": 5.1245927810668945,
2793
+ "learning_rate": 3.787878787878788e-05,
2794
+ "loss": 1.1008,
2795
+ "step": 2540
2796
+ },
2797
+ {
2798
+ "epoch": 112.97777777777777,
2799
+ "eval_accuracy": 0.21739130434782608,
2800
+ "eval_loss": 1.4332908391952515,
2801
+ "eval_runtime": 0.9004,
2802
+ "eval_samples_per_second": 51.09,
2803
+ "eval_steps_per_second": 3.332,
2804
+ "step": 2542
2805
+ },
2806
+ {
2807
+ "epoch": 113.33333333333333,
2808
+ "grad_norm": 11.329251289367676,
2809
+ "learning_rate": 3.409090909090909e-05,
2810
+ "loss": 1.0763,
2811
+ "step": 2550
2812
+ },
2813
+ {
2814
+ "epoch": 113.77777777777777,
2815
+ "grad_norm": 3.865112066268921,
2816
+ "learning_rate": 3.0303030303030306e-05,
2817
+ "loss": 1.0884,
2818
+ "step": 2560
2819
+ },
2820
+ {
2821
+ "epoch": 114.0,
2822
+ "eval_accuracy": 0.21739130434782608,
2823
+ "eval_loss": 1.4406064748764038,
2824
+ "eval_runtime": 0.9189,
2825
+ "eval_samples_per_second": 50.057,
2826
+ "eval_steps_per_second": 3.265,
2827
+ "step": 2565
2828
+ },
2829
+ {
2830
+ "epoch": 114.22222222222223,
2831
+ "grad_norm": 7.575568675994873,
2832
+ "learning_rate": 2.6515151515151516e-05,
2833
+ "loss": 1.088,
2834
+ "step": 2570
2835
+ },
2836
+ {
2837
+ "epoch": 114.66666666666667,
2838
+ "grad_norm": 7.7806620597839355,
2839
+ "learning_rate": 2.272727272727273e-05,
2840
+ "loss": 1.0889,
2841
+ "step": 2580
2842
+ },
2843
+ {
2844
+ "epoch": 114.97777777777777,
2845
+ "eval_accuracy": 0.21739130434782608,
2846
+ "eval_loss": 1.447421908378601,
2847
+ "eval_runtime": 0.9635,
2848
+ "eval_samples_per_second": 47.742,
2849
+ "eval_steps_per_second": 3.114,
2850
+ "step": 2587
2851
+ },
2852
+ {
2853
+ "epoch": 115.11111111111111,
2854
+ "grad_norm": 20.770198822021484,
2855
+ "learning_rate": 1.893939393939394e-05,
2856
+ "loss": 1.0754,
2857
+ "step": 2590
2858
+ },
2859
+ {
2860
+ "epoch": 115.55555555555556,
2861
+ "grad_norm": 5.939628601074219,
2862
+ "learning_rate": 1.5151515151515153e-05,
2863
+ "loss": 1.07,
2864
+ "step": 2600
2865
+ },
2866
+ {
2867
+ "epoch": 116.0,
2868
+ "grad_norm": 35.29814147949219,
2869
+ "learning_rate": 1.1363636363636365e-05,
2870
+ "loss": 1.0729,
2871
+ "step": 2610
2872
+ },
2873
+ {
2874
+ "epoch": 116.0,
2875
+ "eval_accuracy": 0.21739130434782608,
2876
+ "eval_loss": 1.4561296701431274,
2877
+ "eval_runtime": 0.899,
2878
+ "eval_samples_per_second": 51.169,
2879
+ "eval_steps_per_second": 3.337,
2880
+ "step": 2610
2881
+ },
2882
+ {
2883
+ "epoch": 116.44444444444444,
2884
+ "grad_norm": 24.017641067504883,
2885
+ "learning_rate": 7.5757575757575764e-06,
2886
+ "loss": 1.0589,
2887
+ "step": 2620
2888
+ },
2889
+ {
2890
+ "epoch": 116.88888888888889,
2891
+ "grad_norm": 14.974956512451172,
2892
+ "learning_rate": 3.7878787878787882e-06,
2893
+ "loss": 1.0671,
2894
+ "step": 2630
2895
+ },
2896
+ {
2897
+ "epoch": 116.97777777777777,
2898
+ "eval_accuracy": 0.21739130434782608,
2899
+ "eval_loss": 1.4538123607635498,
2900
+ "eval_runtime": 0.8903,
2901
+ "eval_samples_per_second": 51.667,
2902
+ "eval_steps_per_second": 3.37,
2903
+ "step": 2632
2904
+ },
2905
+ {
2906
+ "epoch": 117.33333333333333,
2907
+ "grad_norm": 8.030067443847656,
2908
+ "learning_rate": 0.0,
2909
+ "loss": 1.0937,
2910
+ "step": 2640
2911
+ },
2912
+ {
2913
+ "epoch": 117.33333333333333,
2914
+ "eval_accuracy": 0.21739130434782608,
2915
+ "eval_loss": 1.453188180923462,
2916
+ "eval_runtime": 1.73,
2917
+ "eval_samples_per_second": 26.589,
2918
+ "eval_steps_per_second": 1.734,
2919
+ "step": 2640
2920
+ },
2921
+ {
2922
+ "epoch": 117.33333333333333,
2923
+ "step": 2640,
2924
+ "total_flos": 5.466852859010089e+18,
2925
+ "train_loss": 1.1631801536588957,
2926
+ "train_runtime": 5282.7516,
2927
+ "train_samples_per_second": 32.529,
2928
+ "train_steps_per_second": 0.5
2929
+ }
2930
+ ],
2931
+ "logging_steps": 10,
2932
+ "max_steps": 2640,
2933
+ "num_input_tokens_seen": 0,
2934
+ "num_train_epochs": 120,
2935
+ "save_steps": 500,
2936
+ "stateful_callbacks": {
2937
+ "TrainerControl": {
2938
+ "args": {
2939
+ "should_epoch_stop": false,
2940
+ "should_evaluate": false,
2941
+ "should_log": false,
2942
+ "should_save": true,
2943
+ "should_training_stop": true
2944
+ },
2945
+ "attributes": {}
2946
+ }
2947
+ },
2948
+ "total_flos": 5.466852859010089e+18,
2949
+ "train_batch_size": 16,
2950
+ "trial_name": null,
2951
+ "trial_params": null
2952
+ }