ytcheng commited on
Commit
ac59fb7
1 Parent(s): 22ee540

End of training

Browse files
README.md CHANGED
@@ -3,6 +3,7 @@ license: llama3
3
  library_name: peft
4
  tags:
5
  - llama-factory
 
6
  - generated_from_trainer
7
  base_model: meta-llama/Meta-Llama-3-70B-Instruct
8
  model-index:
@@ -15,9 +16,9 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # llama3-70B-lora-pretrain_v2
17
 
18
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.9383
21
 
22
  ## Model description
23
 
 
3
  library_name: peft
4
  tags:
5
  - llama-factory
6
+ - lora
7
  - generated_from_trainer
8
  base_model: meta-llama/Meta-Llama-3-70B-Instruct
9
  model-index:
 
16
 
17
  # llama3-70B-lora-pretrain_v2
18
 
19
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) on the sm_artile dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 1.9382
22
 
23
  ## Model description
24
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.998592210229939,
3
+ "eval_loss": 1.9382482767105103,
4
+ "eval_runtime": 935.7433,
5
+ "eval_samples_per_second": 1.012,
6
+ "eval_steps_per_second": 0.507,
7
+ "perplexity": 6.946571834848492,
8
+ "total_flos": 1.0917373877893988e+19,
9
+ "train_loss": 2.069366264641751,
10
+ "train_runtime": 97122.8592,
11
+ "train_samples_per_second": 0.263,
12
+ "train_steps_per_second": 0.033
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.998592210229939,
3
+ "eval_loss": 1.9382482767105103,
4
+ "eval_runtime": 935.7433,
5
+ "eval_samples_per_second": 1.012,
6
+ "eval_steps_per_second": 0.507,
7
+ "perplexity": 6.946571834848492
8
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.998592210229939,
3
+ "total_flos": 1.0917373877893988e+19,
4
+ "train_loss": 2.069366264641751,
5
+ "train_runtime": 97122.8592,
6
+ "train_samples_per_second": 0.263,
7
+ "train_steps_per_second": 0.033
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2511 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.998592210229939,
5
+ "eval_steps": 100,
6
+ "global_step": 3195,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.009385265133740028,
13
+ "grad_norm": 0.4665657877922058,
14
+ "learning_rate": 1.6000000000000001e-06,
15
+ "loss": 2.9241,
16
+ "step": 10
17
+ },
18
+ {
19
+ "epoch": 0.018770530267480056,
20
+ "grad_norm": 0.36958208680152893,
21
+ "learning_rate": 3.4000000000000005e-06,
22
+ "loss": 2.9288,
23
+ "step": 20
24
+ },
25
+ {
26
+ "epoch": 0.028155795401220086,
27
+ "grad_norm": 0.3911522626876831,
28
+ "learning_rate": 5.2e-06,
29
+ "loss": 2.8784,
30
+ "step": 30
31
+ },
32
+ {
33
+ "epoch": 0.03754106053496011,
34
+ "grad_norm": 0.6396021842956543,
35
+ "learning_rate": 7.2e-06,
36
+ "loss": 2.9314,
37
+ "step": 40
38
+ },
39
+ {
40
+ "epoch": 0.04692632566870014,
41
+ "grad_norm": 0.5952326059341431,
42
+ "learning_rate": 9.2e-06,
43
+ "loss": 2.8987,
44
+ "step": 50
45
+ },
46
+ {
47
+ "epoch": 0.05631159080244017,
48
+ "grad_norm": 0.570318341255188,
49
+ "learning_rate": 1.1200000000000001e-05,
50
+ "loss": 2.7701,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.0656968559361802,
55
+ "grad_norm": 0.5945647358894348,
56
+ "learning_rate": 1.32e-05,
57
+ "loss": 2.7628,
58
+ "step": 70
59
+ },
60
+ {
61
+ "epoch": 0.07508212106992022,
62
+ "grad_norm": 0.5424015522003174,
63
+ "learning_rate": 1.52e-05,
64
+ "loss": 2.8025,
65
+ "step": 80
66
+ },
67
+ {
68
+ "epoch": 0.08446738620366025,
69
+ "grad_norm": 0.5131893157958984,
70
+ "learning_rate": 1.7199999999999998e-05,
71
+ "loss": 2.6489,
72
+ "step": 90
73
+ },
74
+ {
75
+ "epoch": 0.09385265133740028,
76
+ "grad_norm": 0.550221860408783,
77
+ "learning_rate": 1.9200000000000003e-05,
78
+ "loss": 2.6995,
79
+ "step": 100
80
+ },
81
+ {
82
+ "epoch": 0.09385265133740028,
83
+ "eval_loss": 2.63047194480896,
84
+ "eval_runtime": 937.5041,
85
+ "eval_samples_per_second": 1.01,
86
+ "eval_steps_per_second": 0.506,
87
+ "step": 100
88
+ },
89
+ {
90
+ "epoch": 0.1032379164711403,
91
+ "grad_norm": 0.42704904079437256,
92
+ "learning_rate": 2.12e-05,
93
+ "loss": 2.6395,
94
+ "step": 110
95
+ },
96
+ {
97
+ "epoch": 0.11262318160488034,
98
+ "grad_norm": 0.44245535135269165,
99
+ "learning_rate": 2.32e-05,
100
+ "loss": 2.6048,
101
+ "step": 120
102
+ },
103
+ {
104
+ "epoch": 0.12200844673862037,
105
+ "grad_norm": 0.3578210473060608,
106
+ "learning_rate": 2.5200000000000003e-05,
107
+ "loss": 2.5695,
108
+ "step": 130
109
+ },
110
+ {
111
+ "epoch": 0.1313937118723604,
112
+ "grad_norm": 0.5043906569480896,
113
+ "learning_rate": 2.7200000000000004e-05,
114
+ "loss": 2.5239,
115
+ "step": 140
116
+ },
117
+ {
118
+ "epoch": 0.14077897700610043,
119
+ "grad_norm": 0.4712482988834381,
120
+ "learning_rate": 2.9199999999999998e-05,
121
+ "loss": 2.4199,
122
+ "step": 150
123
+ },
124
+ {
125
+ "epoch": 0.15016424213984045,
126
+ "grad_norm": 0.5192570090293884,
127
+ "learning_rate": 3.12e-05,
128
+ "loss": 2.5526,
129
+ "step": 160
130
+ },
131
+ {
132
+ "epoch": 0.15954950727358047,
133
+ "grad_norm": 0.48030856251716614,
134
+ "learning_rate": 3.32e-05,
135
+ "loss": 2.4876,
136
+ "step": 170
137
+ },
138
+ {
139
+ "epoch": 0.1689347724073205,
140
+ "grad_norm": 0.9590848088264465,
141
+ "learning_rate": 3.52e-05,
142
+ "loss": 2.439,
143
+ "step": 180
144
+ },
145
+ {
146
+ "epoch": 0.17832003754106054,
147
+ "grad_norm": 0.5423749685287476,
148
+ "learning_rate": 3.72e-05,
149
+ "loss": 2.3547,
150
+ "step": 190
151
+ },
152
+ {
153
+ "epoch": 0.18770530267480057,
154
+ "grad_norm": 0.4835156798362732,
155
+ "learning_rate": 3.9200000000000004e-05,
156
+ "loss": 2.4199,
157
+ "step": 200
158
+ },
159
+ {
160
+ "epoch": 0.18770530267480057,
161
+ "eval_loss": 2.3978819847106934,
162
+ "eval_runtime": 937.0546,
163
+ "eval_samples_per_second": 1.011,
164
+ "eval_steps_per_second": 0.506,
165
+ "step": 200
166
+ },
167
+ {
168
+ "epoch": 0.1970905678085406,
169
+ "grad_norm": 0.7263341546058655,
170
+ "learning_rate": 4.12e-05,
171
+ "loss": 2.4937,
172
+ "step": 210
173
+ },
174
+ {
175
+ "epoch": 0.2064758329422806,
176
+ "grad_norm": 0.6794207096099854,
177
+ "learning_rate": 4.32e-05,
178
+ "loss": 2.3535,
179
+ "step": 220
180
+ },
181
+ {
182
+ "epoch": 0.21586109807602064,
183
+ "grad_norm": 0.6615617275238037,
184
+ "learning_rate": 4.52e-05,
185
+ "loss": 2.3753,
186
+ "step": 230
187
+ },
188
+ {
189
+ "epoch": 0.22524636320976069,
190
+ "grad_norm": 0.7630943059921265,
191
+ "learning_rate": 4.72e-05,
192
+ "loss": 2.3971,
193
+ "step": 240
194
+ },
195
+ {
196
+ "epoch": 0.2346316283435007,
197
+ "grad_norm": 0.7536003589630127,
198
+ "learning_rate": 4.92e-05,
199
+ "loss": 2.3282,
200
+ "step": 250
201
+ },
202
+ {
203
+ "epoch": 0.24401689347724073,
204
+ "grad_norm": 1.0338555574417114,
205
+ "learning_rate": 5.1200000000000004e-05,
206
+ "loss": 2.333,
207
+ "step": 260
208
+ },
209
+ {
210
+ "epoch": 0.25340215861098075,
211
+ "grad_norm": 0.7328284382820129,
212
+ "learning_rate": 5.3200000000000006e-05,
213
+ "loss": 2.3966,
214
+ "step": 270
215
+ },
216
+ {
217
+ "epoch": 0.2627874237447208,
218
+ "grad_norm": 0.8608214855194092,
219
+ "learning_rate": 5.520000000000001e-05,
220
+ "loss": 2.2628,
221
+ "step": 280
222
+ },
223
+ {
224
+ "epoch": 0.2721726888784608,
225
+ "grad_norm": 0.8262203931808472,
226
+ "learning_rate": 5.72e-05,
227
+ "loss": 2.3092,
228
+ "step": 290
229
+ },
230
+ {
231
+ "epoch": 0.28155795401220085,
232
+ "grad_norm": 0.8401615619659424,
233
+ "learning_rate": 5.92e-05,
234
+ "loss": 2.2722,
235
+ "step": 300
236
+ },
237
+ {
238
+ "epoch": 0.28155795401220085,
239
+ "eval_loss": 2.217956066131592,
240
+ "eval_runtime": 936.7717,
241
+ "eval_samples_per_second": 1.011,
242
+ "eval_steps_per_second": 0.506,
243
+ "step": 300
244
+ },
245
+ {
246
+ "epoch": 0.29094321914594085,
247
+ "grad_norm": 0.7364845275878906,
248
+ "learning_rate": 6.12e-05,
249
+ "loss": 2.2179,
250
+ "step": 310
251
+ },
252
+ {
253
+ "epoch": 0.3003284842796809,
254
+ "grad_norm": 0.8354003429412842,
255
+ "learning_rate": 6.3e-05,
256
+ "loss": 2.3654,
257
+ "step": 320
258
+ },
259
+ {
260
+ "epoch": 0.30971374941342095,
261
+ "grad_norm": 1.0776605606079102,
262
+ "learning_rate": 6.500000000000001e-05,
263
+ "loss": 2.2143,
264
+ "step": 330
265
+ },
266
+ {
267
+ "epoch": 0.31909901454716094,
268
+ "grad_norm": 0.8842081427574158,
269
+ "learning_rate": 6.7e-05,
270
+ "loss": 2.2705,
271
+ "step": 340
272
+ },
273
+ {
274
+ "epoch": 0.328484279680901,
275
+ "grad_norm": 1.1752427816390991,
276
+ "learning_rate": 6.9e-05,
277
+ "loss": 2.2334,
278
+ "step": 350
279
+ },
280
+ {
281
+ "epoch": 0.337869544814641,
282
+ "grad_norm": 0.7227160334587097,
283
+ "learning_rate": 7.1e-05,
284
+ "loss": 2.228,
285
+ "step": 360
286
+ },
287
+ {
288
+ "epoch": 0.34725480994838104,
289
+ "grad_norm": 1.1053409576416016,
290
+ "learning_rate": 7.3e-05,
291
+ "loss": 2.2071,
292
+ "step": 370
293
+ },
294
+ {
295
+ "epoch": 0.3566400750821211,
296
+ "grad_norm": 0.9307537078857422,
297
+ "learning_rate": 7.500000000000001e-05,
298
+ "loss": 2.2334,
299
+ "step": 380
300
+ },
301
+ {
302
+ "epoch": 0.3660253402158611,
303
+ "grad_norm": 0.9264342188835144,
304
+ "learning_rate": 7.7e-05,
305
+ "loss": 2.2815,
306
+ "step": 390
307
+ },
308
+ {
309
+ "epoch": 0.37541060534960113,
310
+ "grad_norm": 1.0281509160995483,
311
+ "learning_rate": 7.900000000000001e-05,
312
+ "loss": 2.0762,
313
+ "step": 400
314
+ },
315
+ {
316
+ "epoch": 0.37541060534960113,
317
+ "eval_loss": 2.12506103515625,
318
+ "eval_runtime": 936.7185,
319
+ "eval_samples_per_second": 1.011,
320
+ "eval_steps_per_second": 0.506,
321
+ "step": 400
322
+ },
323
+ {
324
+ "epoch": 0.38479587048334113,
325
+ "grad_norm": 0.979566216468811,
326
+ "learning_rate": 8.1e-05,
327
+ "loss": 2.1801,
328
+ "step": 410
329
+ },
330
+ {
331
+ "epoch": 0.3941811356170812,
332
+ "grad_norm": 1.4208861589431763,
333
+ "learning_rate": 8.3e-05,
334
+ "loss": 2.263,
335
+ "step": 420
336
+ },
337
+ {
338
+ "epoch": 0.40356640075082123,
339
+ "grad_norm": 1.0267932415008545,
340
+ "learning_rate": 8.5e-05,
341
+ "loss": 2.1086,
342
+ "step": 430
343
+ },
344
+ {
345
+ "epoch": 0.4129516658845612,
346
+ "grad_norm": 1.010489583015442,
347
+ "learning_rate": 8.7e-05,
348
+ "loss": 2.1647,
349
+ "step": 440
350
+ },
351
+ {
352
+ "epoch": 0.4223369310183013,
353
+ "grad_norm": 0.784968912601471,
354
+ "learning_rate": 8.900000000000001e-05,
355
+ "loss": 2.2024,
356
+ "step": 450
357
+ },
358
+ {
359
+ "epoch": 0.43172219615204127,
360
+ "grad_norm": 1.1498106718063354,
361
+ "learning_rate": 9.1e-05,
362
+ "loss": 2.0905,
363
+ "step": 460
364
+ },
365
+ {
366
+ "epoch": 0.4411074612857813,
367
+ "grad_norm": 0.7570444345474243,
368
+ "learning_rate": 9.300000000000001e-05,
369
+ "loss": 2.0965,
370
+ "step": 470
371
+ },
372
+ {
373
+ "epoch": 0.45049272641952137,
374
+ "grad_norm": 1.162133812904358,
375
+ "learning_rate": 9.5e-05,
376
+ "loss": 2.1354,
377
+ "step": 480
378
+ },
379
+ {
380
+ "epoch": 0.45987799155326137,
381
+ "grad_norm": 0.963750422000885,
382
+ "learning_rate": 9.7e-05,
383
+ "loss": 2.1286,
384
+ "step": 490
385
+ },
386
+ {
387
+ "epoch": 0.4692632566870014,
388
+ "grad_norm": 0.9272373914718628,
389
+ "learning_rate": 9.900000000000001e-05,
390
+ "loss": 1.9652,
391
+ "step": 500
392
+ },
393
+ {
394
+ "epoch": 0.4692632566870014,
395
+ "eval_loss": 2.085766077041626,
396
+ "eval_runtime": 937.6717,
397
+ "eval_samples_per_second": 1.01,
398
+ "eval_steps_per_second": 0.506,
399
+ "step": 500
400
+ },
401
+ {
402
+ "epoch": 0.4786485218207414,
403
+ "grad_norm": 0.7409124374389648,
404
+ "learning_rate": 9.999915070025401e-05,
405
+ "loss": 2.099,
406
+ "step": 510
407
+ },
408
+ {
409
+ "epoch": 0.48803378695448146,
410
+ "grad_norm": 0.8032079339027405,
411
+ "learning_rate": 9.999235647539953e-05,
412
+ "loss": 2.1874,
413
+ "step": 520
414
+ },
415
+ {
416
+ "epoch": 0.4974190520882215,
417
+ "grad_norm": 1.00968599319458,
418
+ "learning_rate": 9.997876894893606e-05,
419
+ "loss": 2.2589,
420
+ "step": 530
421
+ },
422
+ {
423
+ "epoch": 0.5068043172219615,
424
+ "grad_norm": 0.7752834558486938,
425
+ "learning_rate": 9.995838996722914e-05,
426
+ "loss": 2.1808,
427
+ "step": 540
428
+ },
429
+ {
430
+ "epoch": 0.5161895823557016,
431
+ "grad_norm": 0.8999218940734863,
432
+ "learning_rate": 9.993122229951354e-05,
433
+ "loss": 2.2034,
434
+ "step": 550
435
+ },
436
+ {
437
+ "epoch": 0.5255748474894416,
438
+ "grad_norm": 0.8991349339485168,
439
+ "learning_rate": 9.989726963751682e-05,
440
+ "loss": 2.1284,
441
+ "step": 560
442
+ },
443
+ {
444
+ "epoch": 0.5349601126231815,
445
+ "grad_norm": 0.9258519411087036,
446
+ "learning_rate": 9.985653659495773e-05,
447
+ "loss": 2.0642,
448
+ "step": 570
449
+ },
450
+ {
451
+ "epoch": 0.5443453777569216,
452
+ "grad_norm": 0.7985184788703918,
453
+ "learning_rate": 9.980902870691931e-05,
454
+ "loss": 1.9404,
455
+ "step": 580
456
+ },
457
+ {
458
+ "epoch": 0.5537306428906617,
459
+ "grad_norm": 0.9277619123458862,
460
+ "learning_rate": 9.975475242909667e-05,
461
+ "loss": 1.9928,
462
+ "step": 590
463
+ },
464
+ {
465
+ "epoch": 0.5631159080244017,
466
+ "grad_norm": 0.8406923413276672,
467
+ "learning_rate": 9.969371513691982e-05,
468
+ "loss": 2.1893,
469
+ "step": 600
470
+ },
471
+ {
472
+ "epoch": 0.5631159080244017,
473
+ "eval_loss": 2.0628976821899414,
474
+ "eval_runtime": 937.447,
475
+ "eval_samples_per_second": 1.01,
476
+ "eval_steps_per_second": 0.506,
477
+ "step": 600
478
+ },
479
+ {
480
+ "epoch": 0.5725011731581418,
481
+ "grad_norm": 0.678254246711731,
482
+ "learning_rate": 9.962592512455138e-05,
483
+ "loss": 2.1181,
484
+ "step": 610
485
+ },
486
+ {
487
+ "epoch": 0.5818864382918817,
488
+ "grad_norm": 0.7161825895309448,
489
+ "learning_rate": 9.955139160375959e-05,
490
+ "loss": 2.1328,
491
+ "step": 620
492
+ },
493
+ {
494
+ "epoch": 0.5912717034256217,
495
+ "grad_norm": 0.7554607391357422,
496
+ "learning_rate": 9.947012470266645e-05,
497
+ "loss": 2.0865,
498
+ "step": 630
499
+ },
500
+ {
501
+ "epoch": 0.6006569685593618,
502
+ "grad_norm": 0.9590722322463989,
503
+ "learning_rate": 9.938213546437154e-05,
504
+ "loss": 2.1012,
505
+ "step": 640
506
+ },
507
+ {
508
+ "epoch": 0.6100422336931018,
509
+ "grad_norm": 0.8377043008804321,
510
+ "learning_rate": 9.928743584545132e-05,
511
+ "loss": 2.1155,
512
+ "step": 650
513
+ },
514
+ {
515
+ "epoch": 0.6194274988268419,
516
+ "grad_norm": 0.7528682947158813,
517
+ "learning_rate": 9.91860387143345e-05,
518
+ "loss": 2.2455,
519
+ "step": 660
520
+ },
521
+ {
522
+ "epoch": 0.6288127639605818,
523
+ "grad_norm": 0.7164850234985352,
524
+ "learning_rate": 9.907795784955327e-05,
525
+ "loss": 2.155,
526
+ "step": 670
527
+ },
528
+ {
529
+ "epoch": 0.6381980290943219,
530
+ "grad_norm": 0.7665808200836182,
531
+ "learning_rate": 9.896320793787106e-05,
532
+ "loss": 2.0722,
533
+ "step": 680
534
+ },
535
+ {
536
+ "epoch": 0.6475832942280619,
537
+ "grad_norm": 0.8012193441390991,
538
+ "learning_rate": 9.884180457228678e-05,
539
+ "loss": 2.1045,
540
+ "step": 690
541
+ },
542
+ {
543
+ "epoch": 0.656968559361802,
544
+ "grad_norm": 0.8847366571426392,
545
+ "learning_rate": 9.871376424991589e-05,
546
+ "loss": 2.0153,
547
+ "step": 700
548
+ },
549
+ {
550
+ "epoch": 0.656968559361802,
551
+ "eval_loss": 2.047290802001953,
552
+ "eval_runtime": 937.2712,
553
+ "eval_samples_per_second": 1.01,
554
+ "eval_steps_per_second": 0.506,
555
+ "step": 700
556
+ },
557
+ {
558
+ "epoch": 0.666353824495542,
559
+ "grad_norm": 0.7482985854148865,
560
+ "learning_rate": 9.85791043697488e-05,
561
+ "loss": 1.981,
562
+ "step": 710
563
+ },
564
+ {
565
+ "epoch": 0.675739089629282,
566
+ "grad_norm": 0.776764988899231,
567
+ "learning_rate": 9.843784323028638e-05,
568
+ "loss": 2.062,
569
+ "step": 720
570
+ },
571
+ {
572
+ "epoch": 0.685124354763022,
573
+ "grad_norm": 0.8061379790306091,
574
+ "learning_rate": 9.82900000270536e-05,
575
+ "loss": 1.9919,
576
+ "step": 730
577
+ },
578
+ {
579
+ "epoch": 0.6945096198967621,
580
+ "grad_norm": 0.7650445699691772,
581
+ "learning_rate": 9.813559484999102e-05,
582
+ "loss": 2.0104,
583
+ "step": 740
584
+ },
585
+ {
586
+ "epoch": 0.7038948850305021,
587
+ "grad_norm": 1.2102171182632446,
588
+ "learning_rate": 9.797464868072488e-05,
589
+ "loss": 2.0407,
590
+ "step": 750
591
+ },
592
+ {
593
+ "epoch": 0.7132801501642422,
594
+ "grad_norm": 0.7476488351821899,
595
+ "learning_rate": 9.780718338971591e-05,
596
+ "loss": 1.8859,
597
+ "step": 760
598
+ },
599
+ {
600
+ "epoch": 0.7226654152979821,
601
+ "grad_norm": 0.7931784391403198,
602
+ "learning_rate": 9.763322173328753e-05,
603
+ "loss": 2.1804,
604
+ "step": 770
605
+ },
606
+ {
607
+ "epoch": 0.7320506804317222,
608
+ "grad_norm": 1.1580551862716675,
609
+ "learning_rate": 9.745278735053343e-05,
610
+ "loss": 2.1195,
611
+ "step": 780
612
+ },
613
+ {
614
+ "epoch": 0.7414359455654622,
615
+ "grad_norm": 0.7719026803970337,
616
+ "learning_rate": 9.726590476010548e-05,
617
+ "loss": 1.97,
618
+ "step": 790
619
+ },
620
+ {
621
+ "epoch": 0.7508212106992023,
622
+ "grad_norm": 0.9482008218765259,
623
+ "learning_rate": 9.707259935688187e-05,
624
+ "loss": 1.9911,
625
+ "step": 800
626
+ },
627
+ {
628
+ "epoch": 0.7508212106992023,
629
+ "eval_loss": 2.0318124294281006,
630
+ "eval_runtime": 937.1438,
631
+ "eval_samples_per_second": 1.011,
632
+ "eval_steps_per_second": 0.506,
633
+ "step": 800
634
+ },
635
+ {
636
+ "epoch": 0.7602064758329423,
637
+ "grad_norm": 0.7328064441680908,
638
+ "learning_rate": 9.687289740851622e-05,
639
+ "loss": 2.1643,
640
+ "step": 810
641
+ },
642
+ {
643
+ "epoch": 0.7695917409666823,
644
+ "grad_norm": 0.7916940450668335,
645
+ "learning_rate": 9.666682605186835e-05,
646
+ "loss": 2.166,
647
+ "step": 820
648
+ },
649
+ {
650
+ "epoch": 0.7789770061004223,
651
+ "grad_norm": 0.9480199217796326,
652
+ "learning_rate": 9.645441328931654e-05,
653
+ "loss": 2.108,
654
+ "step": 830
655
+ },
656
+ {
657
+ "epoch": 0.7883622712341624,
658
+ "grad_norm": 0.8467483520507812,
659
+ "learning_rate": 9.62356879849525e-05,
660
+ "loss": 2.0794,
661
+ "step": 840
662
+ },
663
+ {
664
+ "epoch": 0.7977475363679024,
665
+ "grad_norm": 0.6742168664932251,
666
+ "learning_rate": 9.601067986065909e-05,
667
+ "loss": 2.1227,
668
+ "step": 850
669
+ },
670
+ {
671
+ "epoch": 0.8071328015016425,
672
+ "grad_norm": 0.7015961408615112,
673
+ "learning_rate": 9.577941949207146e-05,
674
+ "loss": 2.0288,
675
+ "step": 860
676
+ },
677
+ {
678
+ "epoch": 0.8165180666353824,
679
+ "grad_norm": 0.8289031386375427,
680
+ "learning_rate": 9.556596544693951e-05,
681
+ "loss": 2.037,
682
+ "step": 870
683
+ },
684
+ {
685
+ "epoch": 0.8259033317691225,
686
+ "grad_norm": 0.8550999760627747,
687
+ "learning_rate": 9.53229130894619e-05,
688
+ "loss": 2.1908,
689
+ "step": 880
690
+ },
691
+ {
692
+ "epoch": 0.8352885969028625,
693
+ "grad_norm": 0.7619972229003906,
694
+ "learning_rate": 9.50737019461194e-05,
695
+ "loss": 2.1759,
696
+ "step": 890
697
+ },
698
+ {
699
+ "epoch": 0.8446738620366026,
700
+ "grad_norm": 0.7303412556648254,
701
+ "learning_rate": 9.481836588141808e-05,
702
+ "loss": 2.1041,
703
+ "step": 900
704
+ },
705
+ {
706
+ "epoch": 0.8446738620366026,
707
+ "eval_loss": 2.019794225692749,
708
+ "eval_runtime": 938.1446,
709
+ "eval_samples_per_second": 1.009,
710
+ "eval_steps_per_second": 0.505,
711
+ "step": 900
712
+ },
713
+ {
714
+ "epoch": 0.8540591271703426,
715
+ "grad_norm": 0.6646521687507629,
716
+ "learning_rate": 9.455693959216005e-05,
717
+ "loss": 2.0648,
718
+ "step": 910
719
+ },
720
+ {
721
+ "epoch": 0.8634443923040825,
722
+ "grad_norm": 0.910926878452301,
723
+ "learning_rate": 9.428945860272858e-05,
724
+ "loss": 2.0945,
725
+ "step": 920
726
+ },
727
+ {
728
+ "epoch": 0.8728296574378226,
729
+ "grad_norm": 0.7175999283790588,
730
+ "learning_rate": 9.401595926026077e-05,
731
+ "loss": 2.0488,
732
+ "step": 930
733
+ },
734
+ {
735
+ "epoch": 0.8822149225715626,
736
+ "grad_norm": 1.090782642364502,
737
+ "learning_rate": 9.373647872970852e-05,
738
+ "loss": 1.9902,
739
+ "step": 940
740
+ },
741
+ {
742
+ "epoch": 0.8916001877053027,
743
+ "grad_norm": 1.0074050426483154,
744
+ "learning_rate": 9.345105498878826e-05,
745
+ "loss": 1.9974,
746
+ "step": 950
747
+ },
748
+ {
749
+ "epoch": 0.9009854528390427,
750
+ "grad_norm": 0.6742076277732849,
751
+ "learning_rate": 9.315972682282031e-05,
752
+ "loss": 2.0359,
753
+ "step": 960
754
+ },
755
+ {
756
+ "epoch": 0.9103707179727827,
757
+ "grad_norm": 0.9036829471588135,
758
+ "learning_rate": 9.286253381945837e-05,
759
+ "loss": 2.1047,
760
+ "step": 970
761
+ },
762
+ {
763
+ "epoch": 0.9197559831065227,
764
+ "grad_norm": 1.02723228931427,
765
+ "learning_rate": 9.255951636331028e-05,
766
+ "loss": 1.9049,
767
+ "step": 980
768
+ },
769
+ {
770
+ "epoch": 0.9291412482402628,
771
+ "grad_norm": 0.9634993076324463,
772
+ "learning_rate": 9.225071563045007e-05,
773
+ "loss": 2.045,
774
+ "step": 990
775
+ },
776
+ {
777
+ "epoch": 0.9385265133740028,
778
+ "grad_norm": 0.8380990624427795,
779
+ "learning_rate": 9.193617358282277e-05,
780
+ "loss": 2.0488,
781
+ "step": 1000
782
+ },
783
+ {
784
+ "epoch": 0.9385265133740028,
785
+ "eval_loss": 2.0117270946502686,
786
+ "eval_runtime": 939.385,
787
+ "eval_samples_per_second": 1.008,
788
+ "eval_steps_per_second": 0.505,
789
+ "step": 1000
790
+ },
791
+ {
792
+ "epoch": 0.9479117785077429,
793
+ "grad_norm": 0.8317145705223083,
794
+ "learning_rate": 9.161593296254235e-05,
795
+ "loss": 2.0196,
796
+ "step": 1010
797
+ },
798
+ {
799
+ "epoch": 0.9572970436414828,
800
+ "grad_norm": 0.9297407865524292,
801
+ "learning_rate": 9.129003728608367e-05,
802
+ "loss": 2.0798,
803
+ "step": 1020
804
+ },
805
+ {
806
+ "epoch": 0.9666823087752229,
807
+ "grad_norm": 0.6938600540161133,
808
+ "learning_rate": 9.095853083836902e-05,
809
+ "loss": 2.1225,
810
+ "step": 1030
811
+ },
812
+ {
813
+ "epoch": 0.9760675739089629,
814
+ "grad_norm": 0.7786221504211426,
815
+ "learning_rate": 9.062145866675048e-05,
816
+ "loss": 2.0098,
817
+ "step": 1040
818
+ },
819
+ {
820
+ "epoch": 0.985452839042703,
821
+ "grad_norm": 0.7615213394165039,
822
+ "learning_rate": 9.027886657488862e-05,
823
+ "loss": 2.1385,
824
+ "step": 1050
825
+ },
826
+ {
827
+ "epoch": 0.994838104176443,
828
+ "grad_norm": 0.9867497086524963,
829
+ "learning_rate": 8.993080111652831e-05,
830
+ "loss": 2.1039,
831
+ "step": 1060
832
+ },
833
+ {
834
+ "epoch": 1.004223369310183,
835
+ "grad_norm": 0.747900664806366,
836
+ "learning_rate": 8.95773095891727e-05,
837
+ "loss": 2.104,
838
+ "step": 1070
839
+ },
840
+ {
841
+ "epoch": 1.013608634443923,
842
+ "grad_norm": 0.6495920419692993,
843
+ "learning_rate": 8.921844002765613e-05,
844
+ "loss": 2.0333,
845
+ "step": 1080
846
+ },
847
+ {
848
+ "epoch": 1.022993899577663,
849
+ "grad_norm": 0.8173234462738037,
850
+ "learning_rate": 8.885424119761684e-05,
851
+ "loss": 2.0524,
852
+ "step": 1090
853
+ },
854
+ {
855
+ "epoch": 1.0323791647114031,
856
+ "grad_norm": 0.8507694602012634,
857
+ "learning_rate": 8.848476258887031e-05,
858
+ "loss": 1.897,
859
+ "step": 1100
860
+ },
861
+ {
862
+ "epoch": 1.0323791647114031,
863
+ "eval_loss": 2.001809597015381,
864
+ "eval_runtime": 940.385,
865
+ "eval_samples_per_second": 1.007,
866
+ "eval_steps_per_second": 0.504,
867
+ "step": 1100
868
+ },
869
+ {
870
+ "epoch": 1.0417644298451432,
871
+ "grad_norm": 1.4220669269561768,
872
+ "learning_rate": 8.814775911179585e-05,
873
+ "loss": 1.9925,
874
+ "step": 1110
875
+ },
876
+ {
877
+ "epoch": 1.0511496949788832,
878
+ "grad_norm": 0.7997964024543762,
879
+ "learning_rate": 8.776838783200623e-05,
880
+ "loss": 1.9378,
881
+ "step": 1120
882
+ },
883
+ {
884
+ "epoch": 1.0605349601126233,
885
+ "grad_norm": 2.4897735118865967,
886
+ "learning_rate": 8.738388432665424e-05,
887
+ "loss": 1.855,
888
+ "step": 1130
889
+ },
890
+ {
891
+ "epoch": 1.069920225246363,
892
+ "grad_norm": 0.8852785229682922,
893
+ "learning_rate": 8.699430084469276e-05,
894
+ "loss": 2.0958,
895
+ "step": 1140
896
+ },
897
+ {
898
+ "epoch": 1.0793054903801031,
899
+ "grad_norm": 0.698052167892456,
900
+ "learning_rate": 8.65996903253766e-05,
901
+ "loss": 2.1623,
902
+ "step": 1150
903
+ },
904
+ {
905
+ "epoch": 1.0886907555138432,
906
+ "grad_norm": 0.8183425068855286,
907
+ "learning_rate": 8.620010639106853e-05,
908
+ "loss": 2.0938,
909
+ "step": 1160
910
+ },
911
+ {
912
+ "epoch": 1.0980760206475833,
913
+ "grad_norm": 0.8702631592750549,
914
+ "learning_rate": 8.57956033399528e-05,
915
+ "loss": 1.842,
916
+ "step": 1170
917
+ },
918
+ {
919
+ "epoch": 1.1074612857813233,
920
+ "grad_norm": 0.7535381317138672,
921
+ "learning_rate": 8.538623613865678e-05,
922
+ "loss": 2.069,
923
+ "step": 1180
924
+ },
925
+ {
926
+ "epoch": 1.1168465509150634,
927
+ "grad_norm": 0.7525476813316345,
928
+ "learning_rate": 8.497206041478162e-05,
929
+ "loss": 2.0564,
930
+ "step": 1190
931
+ },
932
+ {
933
+ "epoch": 1.1262318160488034,
934
+ "grad_norm": 0.80061274766922,
935
+ "learning_rate": 8.455313244934324e-05,
936
+ "loss": 2.0298,
937
+ "step": 1200
938
+ },
939
+ {
940
+ "epoch": 1.1262318160488034,
941
+ "eval_loss": 1.9951938390731812,
942
+ "eval_runtime": 939.3487,
943
+ "eval_samples_per_second": 1.008,
944
+ "eval_steps_per_second": 0.505,
945
+ "step": 1200
946
+ },
947
+ {
948
+ "epoch": 1.1356170811825435,
949
+ "grad_norm": 0.7788612246513367,
950
+ "learning_rate": 8.412950916912451e-05,
951
+ "loss": 2.1235,
952
+ "step": 1210
953
+ },
954
+ {
955
+ "epoch": 1.1450023463162835,
956
+ "grad_norm": 0.7296183705329895,
957
+ "learning_rate": 8.370124813893962e-05,
958
+ "loss": 2.1001,
959
+ "step": 1220
960
+ },
961
+ {
962
+ "epoch": 1.1543876114500236,
963
+ "grad_norm": 0.9630563855171204,
964
+ "learning_rate": 8.326840755381176e-05,
965
+ "loss": 2.1847,
966
+ "step": 1230
967
+ },
968
+ {
969
+ "epoch": 1.1637728765837636,
970
+ "grad_norm": 0.8061946630477905,
971
+ "learning_rate": 8.283104623106525e-05,
972
+ "loss": 2.0888,
973
+ "step": 1240
974
+ },
975
+ {
976
+ "epoch": 1.1731581417175034,
977
+ "grad_norm": 0.8088260889053345,
978
+ "learning_rate": 8.238922360233297e-05,
979
+ "loss": 1.9784,
980
+ "step": 1250
981
+ },
982
+ {
983
+ "epoch": 1.1825434068512435,
984
+ "grad_norm": 0.8710635900497437,
985
+ "learning_rate": 8.194299970548045e-05,
986
+ "loss": 1.915,
987
+ "step": 1260
988
+ },
989
+ {
990
+ "epoch": 1.1919286719849835,
991
+ "grad_norm": 0.6845631003379822,
992
+ "learning_rate": 8.149243517644745e-05,
993
+ "loss": 2.2073,
994
+ "step": 1270
995
+ },
996
+ {
997
+ "epoch": 1.2013139371187236,
998
+ "grad_norm": 0.697823166847229,
999
+ "learning_rate": 8.103759124100839e-05,
1000
+ "loss": 2.0622,
1001
+ "step": 1280
1002
+ },
1003
+ {
1004
+ "epoch": 1.2106992022524636,
1005
+ "grad_norm": 0.7662308216094971,
1006
+ "learning_rate": 8.057852970645254e-05,
1007
+ "loss": 2.0764,
1008
+ "step": 1290
1009
+ },
1010
+ {
1011
+ "epoch": 1.2200844673862037,
1012
+ "grad_norm": 0.7569838166236877,
1013
+ "learning_rate": 8.011531295318526e-05,
1014
+ "loss": 2.0989,
1015
+ "step": 1300
1016
+ },
1017
+ {
1018
+ "epoch": 1.2200844673862037,
1019
+ "eval_loss": 1.9889544248580933,
1020
+ "eval_runtime": 940.2035,
1021
+ "eval_samples_per_second": 1.007,
1022
+ "eval_steps_per_second": 0.504,
1023
+ "step": 1300
1024
+ },
1025
+ {
1026
+ "epoch": 1.2294697325199437,
1027
+ "grad_norm": 0.7303356528282166,
1028
+ "learning_rate": 7.964800392625129e-05,
1029
+ "loss": 1.9281,
1030
+ "step": 1310
1031
+ },
1032
+ {
1033
+ "epoch": 1.2388549976536838,
1034
+ "grad_norm": 0.8145589232444763,
1035
+ "learning_rate": 7.917666612678138e-05,
1036
+ "loss": 2.0838,
1037
+ "step": 1320
1038
+ },
1039
+ {
1040
+ "epoch": 1.2482402627874238,
1041
+ "grad_norm": 0.9209080934524536,
1042
+ "learning_rate": 7.870136360336328e-05,
1043
+ "loss": 2.0761,
1044
+ "step": 1330
1045
+ },
1046
+ {
1047
+ "epoch": 1.2576255279211637,
1048
+ "grad_norm": 0.8099146485328674,
1049
+ "learning_rate": 7.822216094333847e-05,
1050
+ "loss": 2.098,
1051
+ "step": 1340
1052
+ },
1053
+ {
1054
+ "epoch": 1.267010793054904,
1055
+ "grad_norm": 0.7984501719474792,
1056
+ "learning_rate": 7.773912326402543e-05,
1057
+ "loss": 1.8043,
1058
+ "step": 1350
1059
+ },
1060
+ {
1061
+ "epoch": 1.2763960581886438,
1062
+ "grad_norm": 0.713470995426178,
1063
+ "learning_rate": 7.72523162038713e-05,
1064
+ "loss": 1.9197,
1065
+ "step": 1360
1066
+ },
1067
+ {
1068
+ "epoch": 1.2857813233223838,
1069
+ "grad_norm": 0.8801192045211792,
1070
+ "learning_rate": 7.676180591353219e-05,
1071
+ "loss": 2.1053,
1072
+ "step": 1370
1073
+ },
1074
+ {
1075
+ "epoch": 1.2951665884561239,
1076
+ "grad_norm": 0.7982367873191833,
1077
+ "learning_rate": 7.626765904688447e-05,
1078
+ "loss": 2.2708,
1079
+ "step": 1380
1080
+ },
1081
+ {
1082
+ "epoch": 1.304551853589864,
1083
+ "grad_norm": 0.840467095375061,
1084
+ "learning_rate": 7.576994275196712e-05,
1085
+ "loss": 2.0068,
1086
+ "step": 1390
1087
+ },
1088
+ {
1089
+ "epoch": 1.313937118723604,
1090
+ "grad_norm": 0.8295313715934753,
1091
+ "learning_rate": 7.526872466185742e-05,
1092
+ "loss": 1.8695,
1093
+ "step": 1400
1094
+ },
1095
+ {
1096
+ "epoch": 1.313937118723604,
1097
+ "eval_loss": 1.9837820529937744,
1098
+ "eval_runtime": 938.2236,
1099
+ "eval_samples_per_second": 1.009,
1100
+ "eval_steps_per_second": 0.505,
1101
+ "step": 1400
1102
+ },
1103
+ {
1104
+ "epoch": 1.323322383857344,
1105
+ "grad_norm": 0.7913984060287476,
1106
+ "learning_rate": 7.476407288548036e-05,
1107
+ "loss": 1.9639,
1108
+ "step": 1410
1109
+ },
1110
+ {
1111
+ "epoch": 1.332707648991084,
1112
+ "grad_norm": 0.819983720779419,
1113
+ "learning_rate": 7.425605599835361e-05,
1114
+ "loss": 2.0229,
1115
+ "step": 1420
1116
+ },
1117
+ {
1118
+ "epoch": 1.342092914124824,
1119
+ "grad_norm": 0.8383524417877197,
1120
+ "learning_rate": 7.374474303326896e-05,
1121
+ "loss": 1.9001,
1122
+ "step": 1430
1123
+ },
1124
+ {
1125
+ "epoch": 1.3514781792585642,
1126
+ "grad_norm": 1.1012581586837769,
1127
+ "learning_rate": 7.323020347091177e-05,
1128
+ "loss": 1.9938,
1129
+ "step": 1440
1130
+ },
1131
+ {
1132
+ "epoch": 1.360863444392304,
1133
+ "grad_norm": 0.8445969223976135,
1134
+ "learning_rate": 7.271250723041932e-05,
1135
+ "loss": 2.0726,
1136
+ "step": 1450
1137
+ },
1138
+ {
1139
+ "epoch": 1.370248709526044,
1140
+ "grad_norm": 0.8882044553756714,
1141
+ "learning_rate": 7.21917246598798e-05,
1142
+ "loss": 2.0695,
1143
+ "step": 1460
1144
+ },
1145
+ {
1146
+ "epoch": 1.379633974659784,
1147
+ "grad_norm": 0.7796063423156738,
1148
+ "learning_rate": 7.1667926526773e-05,
1149
+ "loss": 2.0607,
1150
+ "step": 1470
1151
+ },
1152
+ {
1153
+ "epoch": 1.3890192397935242,
1154
+ "grad_norm": 0.7294184565544128,
1155
+ "learning_rate": 7.114118400835382e-05,
1156
+ "loss": 2.1345,
1157
+ "step": 1480
1158
+ },
1159
+ {
1160
+ "epoch": 1.3984045049272642,
1161
+ "grad_norm": 0.7649526000022888,
1162
+ "learning_rate": 7.061156868198028e-05,
1163
+ "loss": 1.999,
1164
+ "step": 1490
1165
+ },
1166
+ {
1167
+ "epoch": 1.4077897700610043,
1168
+ "grad_norm": 0.8375232219696045,
1169
+ "learning_rate": 7.007915251538708e-05,
1170
+ "loss": 2.1573,
1171
+ "step": 1500
1172
+ },
1173
+ {
1174
+ "epoch": 1.4077897700610043,
1175
+ "eval_loss": 1.976365566253662,
1176
+ "eval_runtime": 937.4612,
1177
+ "eval_samples_per_second": 1.01,
1178
+ "eval_steps_per_second": 0.506,
1179
+ "step": 1500
1180
+ },
1181
+ {
1182
+ "epoch": 1.4171750351947443,
1183
+ "grad_norm": 0.7321649193763733,
1184
+ "learning_rate": 6.954400785690622e-05,
1185
+ "loss": 2.0845,
1186
+ "step": 1510
1187
+ },
1188
+ {
1189
+ "epoch": 1.4265603003284844,
1190
+ "grad_norm": 0.778896152973175,
1191
+ "learning_rate": 6.900620742563562e-05,
1192
+ "loss": 1.9401,
1193
+ "step": 1520
1194
+ },
1195
+ {
1196
+ "epoch": 1.4359455654622244,
1197
+ "grad_norm": 0.7842182517051697,
1198
+ "learning_rate": 6.846582430155783e-05,
1199
+ "loss": 1.8992,
1200
+ "step": 1530
1201
+ },
1202
+ {
1203
+ "epoch": 1.4453308305959642,
1204
+ "grad_norm": 0.6991093754768372,
1205
+ "learning_rate": 6.792293191560914e-05,
1206
+ "loss": 2.0625,
1207
+ "step": 1540
1208
+ },
1209
+ {
1210
+ "epoch": 1.4547160957297043,
1211
+ "grad_norm": 0.9950138330459595,
1212
+ "learning_rate": 6.737760403970152e-05,
1213
+ "loss": 2.0905,
1214
+ "step": 1550
1215
+ },
1216
+ {
1217
+ "epoch": 1.4641013608634443,
1218
+ "grad_norm": 0.6939354538917542,
1219
+ "learning_rate": 6.682991477669781e-05,
1220
+ "loss": 2.2633,
1221
+ "step": 1560
1222
+ },
1223
+ {
1224
+ "epoch": 1.4734866259971844,
1225
+ "grad_norm": 0.842707633972168,
1226
+ "learning_rate": 6.627993855034228e-05,
1227
+ "loss": 1.8811,
1228
+ "step": 1570
1229
+ },
1230
+ {
1231
+ "epoch": 1.4828718911309244,
1232
+ "grad_norm": 0.8008860945701599,
1233
+ "learning_rate": 6.572775009514725e-05,
1234
+ "loss": 1.8528,
1235
+ "step": 1580
1236
+ },
1237
+ {
1238
+ "epoch": 1.4922571562646645,
1239
+ "grad_norm": 0.7409046292304993,
1240
+ "learning_rate": 6.517342444623784e-05,
1241
+ "loss": 1.9773,
1242
+ "step": 1590
1243
+ },
1244
+ {
1245
+ "epoch": 1.5016424213984045,
1246
+ "grad_norm": 0.7854930758476257,
1247
+ "learning_rate": 6.461703692915553e-05,
1248
+ "loss": 2.0183,
1249
+ "step": 1600
1250
+ },
1251
+ {
1252
+ "epoch": 1.5016424213984045,
1253
+ "eval_loss": 1.9713027477264404,
1254
+ "eval_runtime": 938.8142,
1255
+ "eval_samples_per_second": 1.009,
1256
+ "eval_steps_per_second": 0.505,
1257
+ "step": 1600
1258
+ },
1259
+ {
1260
+ "epoch": 1.5110276865321446,
1261
+ "grad_norm": 0.8054217100143433,
1262
+ "learning_rate": 6.405866314962252e-05,
1263
+ "loss": 2.1303,
1264
+ "step": 1610
1265
+ },
1266
+ {
1267
+ "epoch": 1.5204129516658846,
1268
+ "grad_norm": 0.7017131447792053,
1269
+ "learning_rate": 6.349837898326784e-05,
1270
+ "loss": 2.0846,
1271
+ "step": 1620
1272
+ },
1273
+ {
1274
+ "epoch": 1.5297982167996245,
1275
+ "grad_norm": 0.8393527865409851,
1276
+ "learning_rate": 6.293626056531693e-05,
1277
+ "loss": 1.8327,
1278
+ "step": 1630
1279
+ },
1280
+ {
1281
+ "epoch": 1.5391834819333647,
1282
+ "grad_norm": 0.8798466920852661,
1283
+ "learning_rate": 6.237238428024572e-05,
1284
+ "loss": 1.8657,
1285
+ "step": 1640
1286
+ },
1287
+ {
1288
+ "epoch": 1.5485687470671046,
1289
+ "grad_norm": 0.7530277371406555,
1290
+ "learning_rate": 6.180682675140121e-05,
1291
+ "loss": 2.245,
1292
+ "step": 1650
1293
+ },
1294
+ {
1295
+ "epoch": 1.5579540122008448,
1296
+ "grad_norm": 0.7642443776130676,
1297
+ "learning_rate": 6.123966483058916e-05,
1298
+ "loss": 1.9058,
1299
+ "step": 1660
1300
+ },
1301
+ {
1302
+ "epoch": 1.5673392773345847,
1303
+ "grad_norm": 0.7459161281585693,
1304
+ "learning_rate": 6.067097558763106e-05,
1305
+ "loss": 1.9482,
1306
+ "step": 1670
1307
+ },
1308
+ {
1309
+ "epoch": 1.5767245424683247,
1310
+ "grad_norm": 0.7460825443267822,
1311
+ "learning_rate": 6.0100836299891314e-05,
1312
+ "loss": 2.127,
1313
+ "step": 1680
1314
+ },
1315
+ {
1316
+ "epoch": 1.5861098076020648,
1317
+ "grad_norm": 0.710259735584259,
1318
+ "learning_rate": 5.9529324441776314e-05,
1319
+ "loss": 2.1407,
1320
+ "step": 1690
1321
+ },
1322
+ {
1323
+ "epoch": 1.5954950727358048,
1324
+ "grad_norm": 0.7227075695991516,
1325
+ "learning_rate": 5.8956517674206605e-05,
1326
+ "loss": 1.9229,
1327
+ "step": 1700
1328
+ },
1329
+ {
1330
+ "epoch": 1.5954950727358048,
1331
+ "eval_loss": 1.967227816581726,
1332
+ "eval_runtime": 937.7597,
1333
+ "eval_samples_per_second": 1.01,
1334
+ "eval_steps_per_second": 0.505,
1335
+ "step": 1700
1336
+ },
1337
+ {
1338
+ "epoch": 1.6048803378695449,
1339
+ "grad_norm": 1.0246224403381348,
1340
+ "learning_rate": 5.838249383406387e-05,
1341
+ "loss": 2.0563,
1342
+ "step": 1710
1343
+ },
1344
+ {
1345
+ "epoch": 1.6142656030032847,
1346
+ "grad_norm": 0.8386335968971252,
1347
+ "learning_rate": 5.780733092361388e-05,
1348
+ "loss": 1.8553,
1349
+ "step": 1720
1350
+ },
1351
+ {
1352
+ "epoch": 1.623650868137025,
1353
+ "grad_norm": 0.7936443090438843,
1354
+ "learning_rate": 5.723110709990707e-05,
1355
+ "loss": 2.1631,
1356
+ "step": 1730
1357
+ },
1358
+ {
1359
+ "epoch": 1.6330361332707648,
1360
+ "grad_norm": 0.7047923803329468,
1361
+ "learning_rate": 5.6653900664157934e-05,
1362
+ "loss": 1.9989,
1363
+ "step": 1740
1364
+ },
1365
+ {
1366
+ "epoch": 1.642421398404505,
1367
+ "grad_norm": 0.8624520897865295,
1368
+ "learning_rate": 5.6075790051105023e-05,
1369
+ "loss": 2.0198,
1370
+ "step": 1750
1371
+ },
1372
+ {
1373
+ "epoch": 1.651806663538245,
1374
+ "grad_norm": 0.8698344826698303,
1375
+ "learning_rate": 5.5496853818352614e-05,
1376
+ "loss": 2.1045,
1377
+ "step": 1760
1378
+ },
1379
+ {
1380
+ "epoch": 1.661191928671985,
1381
+ "grad_norm": 0.78273606300354,
1382
+ "learning_rate": 5.491717063569582e-05,
1383
+ "loss": 1.9399,
1384
+ "step": 1770
1385
+ },
1386
+ {
1387
+ "epoch": 1.670577193805725,
1388
+ "grad_norm": 0.7217704057693481,
1389
+ "learning_rate": 5.433681927443043e-05,
1390
+ "loss": 2.0161,
1391
+ "step": 1780
1392
+ },
1393
+ {
1394
+ "epoch": 1.679962458939465,
1395
+ "grad_norm": 0.7136641144752502,
1396
+ "learning_rate": 5.375587859664885e-05,
1397
+ "loss": 2.1437,
1398
+ "step": 1790
1399
+ },
1400
+ {
1401
+ "epoch": 1.689347724073205,
1402
+ "grad_norm": 0.7752694487571716,
1403
+ "learning_rate": 5.317442754452379e-05,
1404
+ "loss": 1.9732,
1405
+ "step": 1800
1406
+ },
1407
+ {
1408
+ "epoch": 1.689347724073205,
1409
+ "eval_loss": 1.9616819620132446,
1410
+ "eval_runtime": 938.0777,
1411
+ "eval_samples_per_second": 1.01,
1412
+ "eval_steps_per_second": 0.505,
1413
+ "step": 1800
1414
+ },
1415
+ {
1416
+ "epoch": 1.698732989206945,
1417
+ "grad_norm": 0.7387381196022034,
1418
+ "learning_rate": 5.2592545129581185e-05,
1419
+ "loss": 1.8547,
1420
+ "step": 1810
1421
+ },
1422
+ {
1423
+ "epoch": 1.7081182543406852,
1424
+ "grad_norm": 0.9277381300926208,
1425
+ "learning_rate": 5.2010310421963415e-05,
1426
+ "loss": 1.8679,
1427
+ "step": 1820
1428
+ },
1429
+ {
1430
+ "epoch": 1.717503519474425,
1431
+ "grad_norm": 0.7474396824836731,
1432
+ "learning_rate": 5.142780253968481e-05,
1433
+ "loss": 2.1186,
1434
+ "step": 1830
1435
+ },
1436
+ {
1437
+ "epoch": 1.7268887846081653,
1438
+ "grad_norm": 0.8091953992843628,
1439
+ "learning_rate": 5.084510063788056e-05,
1440
+ "loss": 2.0762,
1441
+ "step": 1840
1442
+ },
1443
+ {
1444
+ "epoch": 1.7362740497419051,
1445
+ "grad_norm": 0.7326928973197937,
1446
+ "learning_rate": 5.02622838980505e-05,
1447
+ "loss": 1.9626,
1448
+ "step": 1850
1449
+ },
1450
+ {
1451
+ "epoch": 1.7456593148756452,
1452
+ "grad_norm": 0.6803217530250549,
1453
+ "learning_rate": 4.967943151729945e-05,
1454
+ "loss": 1.8606,
1455
+ "step": 1860
1456
+ },
1457
+ {
1458
+ "epoch": 1.7550445800093852,
1459
+ "grad_norm": 0.7370252013206482,
1460
+ "learning_rate": 4.9096622697575394e-05,
1461
+ "loss": 1.9649,
1462
+ "step": 1870
1463
+ },
1464
+ {
1465
+ "epoch": 1.7644298451431253,
1466
+ "grad_norm": 0.7405309677124023,
1467
+ "learning_rate": 4.851393663490689e-05,
1468
+ "loss": 2.0119,
1469
+ "step": 1880
1470
+ },
1471
+ {
1472
+ "epoch": 1.7738151102768653,
1473
+ "grad_norm": 0.8256239891052246,
1474
+ "learning_rate": 4.793145250864151e-05,
1475
+ "loss": 1.9313,
1476
+ "step": 1890
1477
+ },
1478
+ {
1479
+ "epoch": 1.7832003754106054,
1480
+ "grad_norm": 0.8547274470329285,
1481
+ "learning_rate": 4.7349249470686266e-05,
1482
+ "loss": 1.6835,
1483
+ "step": 1900
1484
+ },
1485
+ {
1486
+ "epoch": 1.7832003754106054,
1487
+ "eval_loss": 1.9573733806610107,
1488
+ "eval_runtime": 937.9887,
1489
+ "eval_samples_per_second": 1.01,
1490
+ "eval_steps_per_second": 0.505,
1491
+ "step": 1900
1492
+ },
1493
+ {
1494
+ "epoch": 1.7925856405443454,
1495
+ "grad_norm": 0.9643909335136414,
1496
+ "learning_rate": 4.676740663475198e-05,
1497
+ "loss": 1.8968,
1498
+ "step": 1910
1499
+ },
1500
+ {
1501
+ "epoch": 1.8019709056780853,
1502
+ "grad_norm": 0.8356963992118835,
1503
+ "learning_rate": 4.6186003065602827e-05,
1504
+ "loss": 1.9651,
1505
+ "step": 1920
1506
+ },
1507
+ {
1508
+ "epoch": 1.8113561708118255,
1509
+ "grad_norm": 0.7793695330619812,
1510
+ "learning_rate": 4.560511776831235e-05,
1511
+ "loss": 2.0038,
1512
+ "step": 1930
1513
+ },
1514
+ {
1515
+ "epoch": 1.8207414359455654,
1516
+ "grad_norm": 0.7343499660491943,
1517
+ "learning_rate": 4.502482967752786e-05,
1518
+ "loss": 1.7593,
1519
+ "step": 1940
1520
+ },
1521
+ {
1522
+ "epoch": 1.8301267010793056,
1523
+ "grad_norm": 0.7215515971183777,
1524
+ "learning_rate": 4.444521764674411e-05,
1525
+ "loss": 2.0668,
1526
+ "step": 1950
1527
+ },
1528
+ {
1529
+ "epoch": 1.8395119662130455,
1530
+ "grad_norm": 0.8646712303161621,
1531
+ "learning_rate": 4.3866360437588294e-05,
1532
+ "loss": 1.9422,
1533
+ "step": 1960
1534
+ },
1535
+ {
1536
+ "epoch": 1.8488972313467855,
1537
+ "grad_norm": 0.8201174736022949,
1538
+ "learning_rate": 4.328833670911724e-05,
1539
+ "loss": 2.2159,
1540
+ "step": 1970
1541
+ },
1542
+ {
1543
+ "epoch": 1.8582824964805256,
1544
+ "grad_norm": 0.8474441766738892,
1545
+ "learning_rate": 4.2711225007128765e-05,
1546
+ "loss": 2.0485,
1547
+ "step": 1980
1548
+ },
1549
+ {
1550
+ "epoch": 1.8676677616142656,
1551
+ "grad_norm": 0.679102897644043,
1552
+ "learning_rate": 4.213510375348837e-05,
1553
+ "loss": 1.9853,
1554
+ "step": 1990
1555
+ },
1556
+ {
1557
+ "epoch": 1.8770530267480057,
1558
+ "grad_norm": 0.7910708785057068,
1559
+ "learning_rate": 4.15600512354726e-05,
1560
+ "loss": 1.9874,
1561
+ "step": 2000
1562
+ },
1563
+ {
1564
+ "epoch": 1.8770530267480057,
1565
+ "eval_loss": 1.9538992643356323,
1566
+ "eval_runtime": 937.8684,
1567
+ "eval_samples_per_second": 1.01,
1568
+ "eval_steps_per_second": 0.505,
1569
+ "step": 2000
1570
+ },
1571
+ {
1572
+ "epoch": 1.8864382918817455,
1573
+ "grad_norm": 0.6856018900871277,
1574
+ "learning_rate": 4.0986145595131055e-05,
1575
+ "loss": 1.9927,
1576
+ "step": 2010
1577
+ },
1578
+ {
1579
+ "epoch": 1.8958235570154858,
1580
+ "grad_norm": 0.7743241786956787,
1581
+ "learning_rate": 4.041346481866768e-05,
1582
+ "loss": 1.9437,
1583
+ "step": 2020
1584
+ },
1585
+ {
1586
+ "epoch": 1.9052088221492256,
1587
+ "grad_norm": 0.7838016748428345,
1588
+ "learning_rate": 3.9842086725843625e-05,
1589
+ "loss": 1.953,
1590
+ "step": 2030
1591
+ },
1592
+ {
1593
+ "epoch": 1.9145940872829659,
1594
+ "grad_norm": 0.7300184965133667,
1595
+ "learning_rate": 3.9272088959402534e-05,
1596
+ "loss": 2.0461,
1597
+ "step": 2040
1598
+ },
1599
+ {
1600
+ "epoch": 1.9239793524167057,
1601
+ "grad_norm": 0.8169785141944885,
1602
+ "learning_rate": 3.8703548974519874e-05,
1603
+ "loss": 2.1075,
1604
+ "step": 2050
1605
+ },
1606
+ {
1607
+ "epoch": 1.9333646175504458,
1608
+ "grad_norm": 0.724233090877533,
1609
+ "learning_rate": 3.8136544028277894e-05,
1610
+ "loss": 1.897,
1611
+ "step": 2060
1612
+ },
1613
+ {
1614
+ "epoch": 1.9427498826841858,
1615
+ "grad_norm": 0.7821764945983887,
1616
+ "learning_rate": 3.757115116916727e-05,
1617
+ "loss": 2.1728,
1618
+ "step": 2070
1619
+ },
1620
+ {
1621
+ "epoch": 1.9521351478179259,
1622
+ "grad_norm": 0.7325600981712341,
1623
+ "learning_rate": 3.7007447226617366e-05,
1624
+ "loss": 1.9865,
1625
+ "step": 2080
1626
+ },
1627
+ {
1628
+ "epoch": 1.961520412951666,
1629
+ "grad_norm": 0.7352250814437866,
1630
+ "learning_rate": 3.6445508800556036e-05,
1631
+ "loss": 2.0352,
1632
+ "step": 2090
1633
+ },
1634
+ {
1635
+ "epoch": 1.970905678085406,
1636
+ "grad_norm": 0.6944911479949951,
1637
+ "learning_rate": 3.5885412251000745e-05,
1638
+ "loss": 1.7607,
1639
+ "step": 2100
1640
+ },
1641
+ {
1642
+ "epoch": 1.970905678085406,
1643
+ "eval_loss": 1.9512391090393066,
1644
+ "eval_runtime": 935.8673,
1645
+ "eval_samples_per_second": 1.012,
1646
+ "eval_steps_per_second": 0.506,
1647
+ "step": 2100
1648
+ },
1649
+ {
1650
+ "epoch": 1.980290943219146,
1651
+ "grad_norm": 1.345037579536438,
1652
+ "learning_rate": 3.532723368768228e-05,
1653
+ "loss": 1.8189,
1654
+ "step": 2110
1655
+ },
1656
+ {
1657
+ "epoch": 1.9896762083528858,
1658
+ "grad_norm": 0.8876499533653259,
1659
+ "learning_rate": 3.477104895970234e-05,
1660
+ "loss": 2.0414,
1661
+ "step": 2120
1662
+ },
1663
+ {
1664
+ "epoch": 1.999061473486626,
1665
+ "grad_norm": 0.832295835018158,
1666
+ "learning_rate": 3.4216933645226776e-05,
1667
+ "loss": 1.9307,
1668
+ "step": 2130
1669
+ },
1670
+ {
1671
+ "epoch": 2.008446738620366,
1672
+ "grad_norm": 0.7709248661994934,
1673
+ "learning_rate": 3.3664963041215406e-05,
1674
+ "loss": 2.0266,
1675
+ "step": 2140
1676
+ },
1677
+ {
1678
+ "epoch": 2.017832003754106,
1679
+ "grad_norm": 0.8151468634605408,
1680
+ "learning_rate": 3.311521215319021e-05,
1681
+ "loss": 1.9638,
1682
+ "step": 2150
1683
+ },
1684
+ {
1685
+ "epoch": 2.027217268887846,
1686
+ "grad_norm": 0.731529951095581,
1687
+ "learning_rate": 3.256775568504305e-05,
1688
+ "loss": 1.9209,
1689
+ "step": 2160
1690
+ },
1691
+ {
1692
+ "epoch": 2.0366025340215863,
1693
+ "grad_norm": 0.7643239498138428,
1694
+ "learning_rate": 3.202266802888439e-05,
1695
+ "loss": 2.0578,
1696
+ "step": 2170
1697
+ },
1698
+ {
1699
+ "epoch": 2.045987799155326,
1700
+ "grad_norm": 0.7743598222732544,
1701
+ "learning_rate": 3.148002325493445e-05,
1702
+ "loss": 1.9511,
1703
+ "step": 2180
1704
+ },
1705
+ {
1706
+ "epoch": 2.055373064289066,
1707
+ "grad_norm": 0.7745042443275452,
1708
+ "learning_rate": 3.0939895101457916e-05,
1709
+ "loss": 1.9773,
1710
+ "step": 2190
1711
+ },
1712
+ {
1713
+ "epoch": 2.0647583294228062,
1714
+ "grad_norm": 0.8307595252990723,
1715
+ "learning_rate": 3.0402356964744027e-05,
1716
+ "loss": 1.9459,
1717
+ "step": 2200
1718
+ },
1719
+ {
1720
+ "epoch": 2.0647583294228062,
1721
+ "eval_loss": 1.947997808456421,
1722
+ "eval_runtime": 937.6638,
1723
+ "eval_samples_per_second": 1.01,
1724
+ "eval_steps_per_second": 0.506,
1725
+ "step": 2200
1726
+ },
1727
+ {
1728
+ "epoch": 2.074143594556546,
1729
+ "grad_norm": 0.7969174981117249,
1730
+ "learning_rate": 2.986748188913287e-05,
1731
+ "loss": 1.8188,
1732
+ "step": 2210
1733
+ },
1734
+ {
1735
+ "epoch": 2.0835288596902863,
1736
+ "grad_norm": 2.2084882259368896,
1737
+ "learning_rate": 2.9335342557089668e-05,
1738
+ "loss": 1.8969,
1739
+ "step": 2220
1740
+ },
1741
+ {
1742
+ "epoch": 2.092914124824026,
1743
+ "grad_norm": 0.7004146575927734,
1744
+ "learning_rate": 2.8806011279328256e-05,
1745
+ "loss": 1.7638,
1746
+ "step": 2230
1747
+ },
1748
+ {
1749
+ "epoch": 2.1022993899577664,
1750
+ "grad_norm": 0.6949226260185242,
1751
+ "learning_rate": 2.827955998498482e-05,
1752
+ "loss": 2.1413,
1753
+ "step": 2240
1754
+ },
1755
+ {
1756
+ "epoch": 2.1116846550915063,
1757
+ "grad_norm": 0.733250081539154,
1758
+ "learning_rate": 2.775606021184396e-05,
1759
+ "loss": 2.0401,
1760
+ "step": 2250
1761
+ },
1762
+ {
1763
+ "epoch": 2.1210699202252465,
1764
+ "grad_norm": 0.845836341381073,
1765
+ "learning_rate": 2.7235583096617346e-05,
1766
+ "loss": 1.9184,
1767
+ "step": 2260
1768
+ },
1769
+ {
1770
+ "epoch": 2.1304551853589864,
1771
+ "grad_norm": 0.7574362754821777,
1772
+ "learning_rate": 2.6718199365277397e-05,
1773
+ "loss": 2.0152,
1774
+ "step": 2270
1775
+ },
1776
+ {
1777
+ "epoch": 2.139840450492726,
1778
+ "grad_norm": 0.9965910911560059,
1779
+ "learning_rate": 2.6203979323446454e-05,
1780
+ "loss": 1.8746,
1781
+ "step": 2280
1782
+ },
1783
+ {
1784
+ "epoch": 2.1492257156264665,
1785
+ "grad_norm": 0.8181989192962646,
1786
+ "learning_rate": 2.5692992846843206e-05,
1787
+ "loss": 1.8114,
1788
+ "step": 2290
1789
+ },
1790
+ {
1791
+ "epoch": 2.1586109807602063,
1792
+ "grad_norm": 0.9191744327545166,
1793
+ "learning_rate": 2.5185309371787513e-05,
1794
+ "loss": 1.7611,
1795
+ "step": 2300
1796
+ },
1797
+ {
1798
+ "epoch": 2.1586109807602063,
1799
+ "eval_loss": 1.9463104009628296,
1800
+ "eval_runtime": 936.8197,
1801
+ "eval_samples_per_second": 1.011,
1802
+ "eval_steps_per_second": 0.506,
1803
+ "step": 2300
1804
+ },
1805
+ {
1806
+ "epoch": 2.1679962458939466,
1807
+ "grad_norm": 0.7393763065338135,
1808
+ "learning_rate": 2.468099788576482e-05,
1809
+ "loss": 1.9138,
1810
+ "step": 2310
1811
+ },
1812
+ {
1813
+ "epoch": 2.1773815110276864,
1814
+ "grad_norm": 0.8604278564453125,
1815
+ "learning_rate": 2.418012691805191e-05,
1816
+ "loss": 1.9187,
1817
+ "step": 2320
1818
+ },
1819
+ {
1820
+ "epoch": 2.1867667761614267,
1821
+ "grad_norm": 0.8256701827049255,
1822
+ "learning_rate": 2.3682764530404365e-05,
1823
+ "loss": 1.9313,
1824
+ "step": 2330
1825
+ },
1826
+ {
1827
+ "epoch": 2.1961520412951665,
1828
+ "grad_norm": 0.8674435019493103,
1829
+ "learning_rate": 2.3188978307808125e-05,
1830
+ "loss": 2.1127,
1831
+ "step": 2340
1832
+ },
1833
+ {
1834
+ "epoch": 2.2055373064289068,
1835
+ "grad_norm": 0.8510717153549194,
1836
+ "learning_rate": 2.2698835349295472e-05,
1837
+ "loss": 1.9931,
1838
+ "step": 2350
1839
+ },
1840
+ {
1841
+ "epoch": 2.2149225715626466,
1842
+ "grad_norm": 0.8951923847198486,
1843
+ "learning_rate": 2.2212402258827115e-05,
1844
+ "loss": 1.8811,
1845
+ "step": 2360
1846
+ },
1847
+ {
1848
+ "epoch": 2.224307836696387,
1849
+ "grad_norm": 0.7499418258666992,
1850
+ "learning_rate": 2.172974513624176e-05,
1851
+ "loss": 1.9194,
1852
+ "step": 2370
1853
+ },
1854
+ {
1855
+ "epoch": 2.2336931018301267,
1856
+ "grad_norm": 0.7999364137649536,
1857
+ "learning_rate": 2.1250929568273774e-05,
1858
+ "loss": 1.9925,
1859
+ "step": 2380
1860
+ },
1861
+ {
1862
+ "epoch": 2.2430783669638665,
1863
+ "grad_norm": 0.8435518145561218,
1864
+ "learning_rate": 2.0776020619641024e-05,
1865
+ "loss": 1.9746,
1866
+ "step": 2390
1867
+ },
1868
+ {
1869
+ "epoch": 2.252463632097607,
1870
+ "grad_norm": 0.8621445298194885,
1871
+ "learning_rate": 2.0305082824203343e-05,
1872
+ "loss": 1.8491,
1873
+ "step": 2400
1874
+ },
1875
+ {
1876
+ "epoch": 2.252463632097607,
1877
+ "eval_loss": 1.9441428184509277,
1878
+ "eval_runtime": 935.9838,
1879
+ "eval_samples_per_second": 1.012,
1880
+ "eval_steps_per_second": 0.506,
1881
+ "step": 2400
1882
+ },
1883
+ {
1884
+ "epoch": 2.2618488972313466,
1885
+ "grad_norm": 0.8402264714241028,
1886
+ "learning_rate": 1.9838180176193178e-05,
1887
+ "loss": 1.9554,
1888
+ "step": 2410
1889
+ },
1890
+ {
1891
+ "epoch": 2.271234162365087,
1892
+ "grad_norm": 0.8322455883026123,
1893
+ "learning_rate": 1.9375376121519807e-05,
1894
+ "loss": 1.9463,
1895
+ "step": 2420
1896
+ },
1897
+ {
1898
+ "epoch": 2.2806194274988267,
1899
+ "grad_norm": 0.8249323964118958,
1900
+ "learning_rate": 1.891673354914761e-05,
1901
+ "loss": 1.8215,
1902
+ "step": 2430
1903
+ },
1904
+ {
1905
+ "epoch": 2.290004692632567,
1906
+ "grad_norm": 0.772278904914856,
1907
+ "learning_rate": 1.8462314782550578e-05,
1908
+ "loss": 1.9064,
1909
+ "step": 2440
1910
+ },
1911
+ {
1912
+ "epoch": 2.299389957766307,
1913
+ "grad_norm": 0.7047111392021179,
1914
+ "learning_rate": 1.8012181571243097e-05,
1915
+ "loss": 2.0491,
1916
+ "step": 2450
1917
+ },
1918
+ {
1919
+ "epoch": 2.308775222900047,
1920
+ "grad_norm": 0.9021138548851013,
1921
+ "learning_rate": 1.756639508238922e-05,
1922
+ "loss": 1.9212,
1923
+ "step": 2460
1924
+ },
1925
+ {
1926
+ "epoch": 2.318160488033787,
1927
+ "grad_norm": 0.9528132677078247,
1928
+ "learning_rate": 1.7125015892490753e-05,
1929
+ "loss": 2.0436,
1930
+ "step": 2470
1931
+ },
1932
+ {
1933
+ "epoch": 2.327545753167527,
1934
+ "grad_norm": 0.826156735420227,
1935
+ "learning_rate": 1.668810397915568e-05,
1936
+ "loss": 1.9951,
1937
+ "step": 2480
1938
+ },
1939
+ {
1940
+ "epoch": 2.336931018301267,
1941
+ "grad_norm": 0.7792008519172668,
1942
+ "learning_rate": 1.6255718712948143e-05,
1943
+ "loss": 1.9846,
1944
+ "step": 2490
1945
+ },
1946
+ {
1947
+ "epoch": 2.346316283435007,
1948
+ "grad_norm": 0.8387865424156189,
1949
+ "learning_rate": 1.5827918849320567e-05,
1950
+ "loss": 1.9121,
1951
+ "step": 2500
1952
+ },
1953
+ {
1954
+ "epoch": 2.346316283435007,
1955
+ "eval_loss": 1.9427493810653687,
1956
+ "eval_runtime": 936.6706,
1957
+ "eval_samples_per_second": 1.011,
1958
+ "eval_steps_per_second": 0.506,
1959
+ "step": 2500
1960
+ },
1961
+ {
1962
+ "epoch": 2.355701548568747,
1963
+ "grad_norm": 0.7230735421180725,
1964
+ "learning_rate": 1.5404762520629724e-05,
1965
+ "loss": 1.8782,
1966
+ "step": 2510
1967
+ },
1968
+ {
1969
+ "epoch": 2.365086813702487,
1970
+ "grad_norm": 0.7646508812904358,
1971
+ "learning_rate": 1.4986307228237268e-05,
1972
+ "loss": 1.8842,
1973
+ "step": 2520
1974
+ },
1975
+ {
1976
+ "epoch": 2.3744720788362272,
1977
+ "grad_norm": 0.7921754121780396,
1978
+ "learning_rate": 1.4572609834695971e-05,
1979
+ "loss": 2.1265,
1980
+ "step": 2530
1981
+ },
1982
+ {
1983
+ "epoch": 2.383857343969967,
1984
+ "grad_norm": 0.8904575109481812,
1985
+ "learning_rate": 1.4163726556023054e-05,
1986
+ "loss": 1.8978,
1987
+ "step": 2540
1988
+ },
1989
+ {
1990
+ "epoch": 2.3932426091037073,
1991
+ "grad_norm": 0.9155052900314331,
1992
+ "learning_rate": 1.3759712954060921e-05,
1993
+ "loss": 1.8854,
1994
+ "step": 2550
1995
+ },
1996
+ {
1997
+ "epoch": 2.402627874237447,
1998
+ "grad_norm": 0.8808565139770508,
1999
+ "learning_rate": 1.3360623928927291e-05,
2000
+ "loss": 1.8698,
2001
+ "step": 2560
2002
+ },
2003
+ {
2004
+ "epoch": 2.4120131393711874,
2005
+ "grad_norm": 0.9437252879142761,
2006
+ "learning_rate": 1.2966513711554744e-05,
2007
+ "loss": 1.7782,
2008
+ "step": 2570
2009
+ },
2010
+ {
2011
+ "epoch": 2.4213984045049273,
2012
+ "grad_norm": 0.853032112121582,
2013
+ "learning_rate": 1.2577435856321668e-05,
2014
+ "loss": 1.953,
2015
+ "step": 2580
2016
+ },
2017
+ {
2018
+ "epoch": 2.430783669638667,
2019
+ "grad_norm": 0.8684459328651428,
2020
+ "learning_rate": 1.219344323377482e-05,
2021
+ "loss": 2.2737,
2022
+ "step": 2590
2023
+ },
2024
+ {
2025
+ "epoch": 2.4401689347724074,
2026
+ "grad_norm": 0.7905233502388,
2027
+ "learning_rate": 1.1814588023444878e-05,
2028
+ "loss": 1.8849,
2029
+ "step": 2600
2030
+ },
2031
+ {
2032
+ "epoch": 2.4401689347724074,
2033
+ "eval_loss": 1.941327452659607,
2034
+ "eval_runtime": 937.1681,
2035
+ "eval_samples_per_second": 1.01,
2036
+ "eval_steps_per_second": 0.506,
2037
+ "step": 2600
2038
+ },
2039
+ {
2040
+ "epoch": 2.449554199906147,
2041
+ "grad_norm": 0.769902765750885,
2042
+ "learning_rate": 1.1440921706756092e-05,
2043
+ "loss": 2.057,
2044
+ "step": 2610
2045
+ },
2046
+ {
2047
+ "epoch": 2.4589394650398875,
2048
+ "grad_norm": 0.8708339333534241,
2049
+ "learning_rate": 1.1072495060030418e-05,
2050
+ "loss": 1.8389,
2051
+ "step": 2620
2052
+ },
2053
+ {
2054
+ "epoch": 2.4683247301736273,
2055
+ "grad_norm": 0.7585152387619019,
2056
+ "learning_rate": 1.0709358147587884e-05,
2057
+ "loss": 1.9889,
2058
+ "step": 2630
2059
+ },
2060
+ {
2061
+ "epoch": 2.4777099953073676,
2062
+ "grad_norm": 0.8954083919525146,
2063
+ "learning_rate": 1.0351560314943392e-05,
2064
+ "loss": 2.0466,
2065
+ "step": 2640
2066
+ },
2067
+ {
2068
+ "epoch": 2.4870952604411074,
2069
+ "grad_norm": 0.8597133755683899,
2070
+ "learning_rate": 9.999150182101319e-06,
2071
+ "loss": 1.8554,
2072
+ "step": 2650
2073
+ },
2074
+ {
2075
+ "epoch": 2.4964805255748477,
2076
+ "grad_norm": 0.7284257411956787,
2077
+ "learning_rate": 9.652175636948807e-06,
2078
+ "loss": 1.9854,
2079
+ "step": 2660
2080
+ },
2081
+ {
2082
+ "epoch": 2.5058657907085875,
2083
+ "grad_norm": 0.8639576435089111,
2084
+ "learning_rate": 9.310683828748251e-06,
2085
+ "loss": 1.924,
2086
+ "step": 2670
2087
+ },
2088
+ {
2089
+ "epoch": 2.5152510558423273,
2090
+ "grad_norm": 0.8123131990432739,
2091
+ "learning_rate": 8.974721161730553e-06,
2092
+ "loss": 1.9737,
2093
+ "step": 2680
2094
+ },
2095
+ {
2096
+ "epoch": 2.5246363209760676,
2097
+ "grad_norm": 0.8097216486930847,
2098
+ "learning_rate": 8.64433328878917e-06,
2099
+ "loss": 2.0566,
2100
+ "step": 2690
2101
+ },
2102
+ {
2103
+ "epoch": 2.534021586109808,
2104
+ "grad_norm": 0.8345467448234558,
2105
+ "learning_rate": 8.319565105276678e-06,
2106
+ "loss": 2.0679,
2107
+ "step": 2700
2108
+ },
2109
+ {
2110
+ "epoch": 2.534021586109808,
2111
+ "eval_loss": 1.9400410652160645,
2112
+ "eval_runtime": 937.8634,
2113
+ "eval_samples_per_second": 1.01,
2114
+ "eval_steps_per_second": 0.505,
2115
+ "step": 2700
2116
+ },
2117
+ {
2118
+ "epoch": 2.5434068512435477,
2119
+ "grad_norm": 0.8246074318885803,
2120
+ "learning_rate": 8.000460742903987e-06,
2121
+ "loss": 2.0611,
2122
+ "step": 2710
2123
+ },
2124
+ {
2125
+ "epoch": 2.5527921163772875,
2126
+ "grad_norm": 0.8071450591087341,
2127
+ "learning_rate": 7.687063563743413e-06,
2128
+ "loss": 1.8266,
2129
+ "step": 2720
2130
+ },
2131
+ {
2132
+ "epoch": 2.562177381511028,
2133
+ "grad_norm": 0.8657225966453552,
2134
+ "learning_rate": 7.379416154336455e-06,
2135
+ "loss": 2.0888,
2136
+ "step": 2730
2137
+ },
2138
+ {
2139
+ "epoch": 2.5715626466447676,
2140
+ "grad_norm": 0.914188027381897,
2141
+ "learning_rate": 7.077560319906695e-06,
2142
+ "loss": 1.8913,
2143
+ "step": 2740
2144
+ },
2145
+ {
2146
+ "epoch": 2.580947911778508,
2147
+ "grad_norm": 1.0282979011535645,
2148
+ "learning_rate": 6.781537078679134e-06,
2149
+ "loss": 2.0157,
2150
+ "step": 2750
2151
+ },
2152
+ {
2153
+ "epoch": 2.5903331769122477,
2154
+ "grad_norm": 0.7493969202041626,
2155
+ "learning_rate": 6.491386656306319e-06,
2156
+ "loss": 2.0123,
2157
+ "step": 2760
2158
+ },
2159
+ {
2160
+ "epoch": 2.5997184420459876,
2161
+ "grad_norm": 0.8242475986480713,
2162
+ "learning_rate": 6.2071484804021475e-06,
2163
+ "loss": 1.8812,
2164
+ "step": 2770
2165
+ },
2166
+ {
2167
+ "epoch": 2.609103707179728,
2168
+ "grad_norm": 0.9322003722190857,
2169
+ "learning_rate": 5.928861175184336e-06,
2170
+ "loss": 1.9338,
2171
+ "step": 2780
2172
+ },
2173
+ {
2174
+ "epoch": 2.618488972313468,
2175
+ "grad_norm": 0.7862038016319275,
2176
+ "learning_rate": 5.656562556225692e-06,
2177
+ "loss": 2.0133,
2178
+ "step": 2790
2179
+ },
2180
+ {
2181
+ "epoch": 2.627874237447208,
2182
+ "grad_norm": 0.7466335892677307,
2183
+ "learning_rate": 5.3902896253156365e-06,
2184
+ "loss": 1.9908,
2185
+ "step": 2800
2186
+ },
2187
+ {
2188
+ "epoch": 2.627874237447208,
2189
+ "eval_loss": 1.939355492591858,
2190
+ "eval_runtime": 937.8925,
2191
+ "eval_samples_per_second": 1.01,
2192
+ "eval_steps_per_second": 0.505,
2193
+ "step": 2800
2194
+ },
2195
+ {
2196
+ "epoch": 2.6372595025809478,
2197
+ "grad_norm": 0.7764114737510681,
2198
+ "learning_rate": 5.13007856543209e-06,
2199
+ "loss": 1.9348,
2200
+ "step": 2810
2201
+ },
2202
+ {
2203
+ "epoch": 2.646644767714688,
2204
+ "grad_norm": 0.8519911766052246,
2205
+ "learning_rate": 4.875964735824645e-06,
2206
+ "loss": 1.9125,
2207
+ "step": 2820
2208
+ },
2209
+ {
2210
+ "epoch": 2.656030032848428,
2211
+ "grad_norm": 0.8778414726257324,
2212
+ "learning_rate": 4.627982667209818e-06,
2213
+ "loss": 2.0829,
2214
+ "step": 2830
2215
+ },
2216
+ {
2217
+ "epoch": 2.665415297982168,
2218
+ "grad_norm": 0.7963272929191589,
2219
+ "learning_rate": 4.386166057078639e-06,
2220
+ "loss": 1.989,
2221
+ "step": 2840
2222
+ },
2223
+ {
2224
+ "epoch": 2.674800563115908,
2225
+ "grad_norm": 0.790169894695282,
2226
+ "learning_rate": 4.150547765117746e-06,
2227
+ "loss": 1.9568,
2228
+ "step": 2850
2229
+ },
2230
+ {
2231
+ "epoch": 2.684185828249648,
2232
+ "grad_norm": 0.8890179395675659,
2233
+ "learning_rate": 3.921159808744085e-06,
2234
+ "loss": 2.104,
2235
+ "step": 2860
2236
+ },
2237
+ {
2238
+ "epoch": 2.693571093383388,
2239
+ "grad_norm": 0.8634796738624573,
2240
+ "learning_rate": 3.698033358754205e-06,
2241
+ "loss": 2.0033,
2242
+ "step": 2870
2243
+ },
2244
+ {
2245
+ "epoch": 2.7029563585171283,
2246
+ "grad_norm": 0.7101777791976929,
2247
+ "learning_rate": 3.481198735088581e-06,
2248
+ "loss": 1.9882,
2249
+ "step": 2880
2250
+ },
2251
+ {
2252
+ "epoch": 2.712341623650868,
2253
+ "grad_norm": 0.8689360022544861,
2254
+ "learning_rate": 3.270685402711471e-06,
2255
+ "loss": 1.9517,
2256
+ "step": 2890
2257
+ },
2258
+ {
2259
+ "epoch": 2.721726888784608,
2260
+ "grad_norm": 0.7733302712440491,
2261
+ "learning_rate": 3.0665219676071057e-06,
2262
+ "loss": 1.9557,
2263
+ "step": 2900
2264
+ },
2265
+ {
2266
+ "epoch": 2.721726888784608,
2267
+ "eval_loss": 1.9387511014938354,
2268
+ "eval_runtime": 937.6722,
2269
+ "eval_samples_per_second": 1.01,
2270
+ "eval_steps_per_second": 0.506,
2271
+ "step": 2900
2272
+ },
2273
+ {
2274
+ "epoch": 2.7311121539183483,
2275
+ "grad_norm": 0.8051707148551941,
2276
+ "learning_rate": 2.8687361728924056e-06,
2277
+ "loss": 1.9718,
2278
+ "step": 2910
2279
+ },
2280
+ {
2281
+ "epoch": 2.740497419052088,
2282
+ "grad_norm": 0.8936708569526672,
2283
+ "learning_rate": 2.6773548950471572e-06,
2284
+ "loss": 2.0474,
2285
+ "step": 2920
2286
+ },
2287
+ {
2288
+ "epoch": 2.7498826841858284,
2289
+ "grad_norm": 0.7301138639450073,
2290
+ "learning_rate": 2.492404140261795e-06,
2291
+ "loss": 1.9602,
2292
+ "step": 2930
2293
+ },
2294
+ {
2295
+ "epoch": 2.759267949319568,
2296
+ "grad_norm": 0.7741363644599915,
2297
+ "learning_rate": 2.3139090409034946e-06,
2298
+ "loss": 2.0386,
2299
+ "step": 2940
2300
+ },
2301
+ {
2302
+ "epoch": 2.768653214453308,
2303
+ "grad_norm": 0.771488606929779,
2304
+ "learning_rate": 2.1418938521010954e-06,
2305
+ "loss": 1.9822,
2306
+ "step": 2950
2307
+ },
2308
+ {
2309
+ "epoch": 2.7780384795870483,
2310
+ "grad_norm": 0.7883902788162231,
2311
+ "learning_rate": 1.9763819484490355e-06,
2312
+ "loss": 2.0061,
2313
+ "step": 2960
2314
+ },
2315
+ {
2316
+ "epoch": 2.7874237447207886,
2317
+ "grad_norm": 6.553975582122803,
2318
+ "learning_rate": 1.8173958208311526e-06,
2319
+ "loss": 1.9747,
2320
+ "step": 2970
2321
+ },
2322
+ {
2323
+ "epoch": 2.7968090098545284,
2324
+ "grad_norm": 0.8313648104667664,
2325
+ "learning_rate": 1.6649570733643982e-06,
2326
+ "loss": 2.0416,
2327
+ "step": 2980
2328
+ },
2329
+ {
2330
+ "epoch": 2.8061942749882682,
2331
+ "grad_norm": 0.867647647857666,
2332
+ "learning_rate": 1.5190864204631672e-06,
2333
+ "loss": 1.8462,
2334
+ "step": 2990
2335
+ },
2336
+ {
2337
+ "epoch": 2.8155795401220085,
2338
+ "grad_norm": 0.740106999874115,
2339
+ "learning_rate": 1.3798036840244667e-06,
2340
+ "loss": 1.9627,
2341
+ "step": 3000
2342
+ },
2343
+ {
2344
+ "epoch": 2.8155795401220085,
2345
+ "eval_loss": 1.9384320974349976,
2346
+ "eval_runtime": 936.8374,
2347
+ "eval_samples_per_second": 1.011,
2348
+ "eval_steps_per_second": 0.506,
2349
+ "step": 3000
2350
+ },
2351
+ {
2352
+ "epoch": 2.8249648052557483,
2353
+ "grad_norm": 0.7124555706977844,
2354
+ "learning_rate": 1.2471277907343703e-06,
2355
+ "loss": 2.1284,
2356
+ "step": 3010
2357
+ },
2358
+ {
2359
+ "epoch": 2.8343500703894886,
2360
+ "grad_norm": 0.809935986995697,
2361
+ "learning_rate": 1.1210767694961655e-06,
2362
+ "loss": 2.0541,
2363
+ "step": 3020
2364
+ },
2365
+ {
2366
+ "epoch": 2.8437353355232284,
2367
+ "grad_norm": 0.801823079586029,
2368
+ "learning_rate": 1.0016677489804171e-06,
2369
+ "loss": 1.9914,
2370
+ "step": 3030
2371
+ },
2372
+ {
2373
+ "epoch": 2.8531206006569687,
2374
+ "grad_norm": 0.7602815628051758,
2375
+ "learning_rate": 8.88916955297453e-07,
2376
+ "loss": 1.9802,
2377
+ "step": 3040
2378
+ },
2379
+ {
2380
+ "epoch": 2.8625058657907085,
2381
+ "grad_norm": 0.7440112829208374,
2382
+ "learning_rate": 7.8283970979241e-07,
2383
+ "loss": 1.9312,
2384
+ "step": 3050
2385
+ },
2386
+ {
2387
+ "epoch": 2.871891130924449,
2388
+ "grad_norm": 2.742288112640381,
2389
+ "learning_rate": 6.834504269632835e-07,
2390
+ "loss": 2.03,
2391
+ "step": 3060
2392
+ },
2393
+ {
2394
+ "epoch": 2.8812763960581886,
2395
+ "grad_norm": 0.815190315246582,
2396
+ "learning_rate": 5.907626125022159e-07,
2397
+ "loss": 1.9215,
2398
+ "step": 3070
2399
+ },
2400
+ {
2401
+ "epoch": 2.8906616611919285,
2402
+ "grad_norm": 0.8376407027244568,
2403
+ "learning_rate": 5.04788861460187e-07,
2404
+ "loss": 1.8776,
2405
+ "step": 3080
2406
+ },
2407
+ {
2408
+ "epoch": 2.9000469263256687,
2409
+ "grad_norm": 0.9095739126205444,
2410
+ "learning_rate": 4.255408565355612e-07,
2411
+ "loss": 1.9567,
2412
+ "step": 3090
2413
+ },
2414
+ {
2415
+ "epoch": 2.9094321914594086,
2416
+ "grad_norm": 0.858132541179657,
2417
+ "learning_rate": 3.530293664865514e-07,
2418
+ "loss": 1.8339,
2419
+ "step": 3100
2420
+ },
2421
+ {
2422
+ "epoch": 2.9094321914594086,
2423
+ "eval_loss": 1.9382846355438232,
2424
+ "eval_runtime": 937.0394,
2425
+ "eval_samples_per_second": 1.011,
2426
+ "eval_steps_per_second": 0.506,
2427
+ "step": 3100
2428
+ },
2429
+ {
2430
+ "epoch": 2.918817456593149,
2431
+ "grad_norm": 0.851668119430542,
2432
+ "learning_rate": 2.872642446678897e-07,
2433
+ "loss": 2.1549,
2434
+ "step": 3110
2435
+ },
2436
+ {
2437
+ "epoch": 2.9282027217268887,
2438
+ "grad_norm": 1.059181571006775,
2439
+ "learning_rate": 2.2825442769188188e-07,
2440
+ "loss": 2.0087,
2441
+ "step": 3120
2442
+ },
2443
+ {
2444
+ "epoch": 2.937587986860629,
2445
+ "grad_norm": 1.1176784038543701,
2446
+ "learning_rate": 1.7600793421402307e-07,
2447
+ "loss": 1.9762,
2448
+ "step": 3130
2449
+ },
2450
+ {
2451
+ "epoch": 2.9469732519943688,
2452
+ "grad_norm": 0.7648433446884155,
2453
+ "learning_rate": 1.305318638434083e-07,
2454
+ "loss": 1.9598,
2455
+ "step": 3140
2456
+ },
2457
+ {
2458
+ "epoch": 2.956358517128109,
2459
+ "grad_norm": 0.8515041470527649,
2460
+ "learning_rate": 9.183239617795436e-08,
2461
+ "loss": 2.0792,
2462
+ "step": 3150
2463
+ },
2464
+ {
2465
+ "epoch": 2.965743782261849,
2466
+ "grad_norm": 3.4551055431365967,
2467
+ "learning_rate": 5.991478996468236e-08,
2468
+ "loss": 1.9946,
2469
+ "step": 3160
2470
+ },
2471
+ {
2472
+ "epoch": 2.9751290473955887,
2473
+ "grad_norm": 0.7828194499015808,
2474
+ "learning_rate": 3.4783382385139565e-08,
2475
+ "loss": 1.9363,
2476
+ "step": 3170
2477
+ },
2478
+ {
2479
+ "epoch": 2.984514312529329,
2480
+ "grad_norm": 0.9214401245117188,
2481
+ "learning_rate": 1.644158846600963e-08,
2482
+ "loss": 1.9451,
2483
+ "step": 3180
2484
+ },
2485
+ {
2486
+ "epoch": 2.9938995776630692,
2487
+ "grad_norm": 0.8548945188522339,
2488
+ "learning_rate": 4.8919006150727195e-09,
2489
+ "loss": 1.8411,
2490
+ "step": 3190
2491
+ },
2492
+ {
2493
+ "epoch": 2.998592210229939,
2494
+ "step": 3195,
2495
+ "total_flos": 1.0917373877893988e+19,
2496
+ "train_loss": 2.069366264641751,
2497
+ "train_runtime": 97122.8592,
2498
+ "train_samples_per_second": 0.263,
2499
+ "train_steps_per_second": 0.033
2500
+ }
2501
+ ],
2502
+ "logging_steps": 10,
2503
+ "max_steps": 3195,
2504
+ "num_input_tokens_seen": 0,
2505
+ "num_train_epochs": 3,
2506
+ "save_steps": 500,
2507
+ "total_flos": 1.0917373877893988e+19,
2508
+ "train_batch_size": 2,
2509
+ "trial_name": null,
2510
+ "trial_params": null
2511
+ }
training_eval_loss.png ADDED
training_loss.png ADDED