sanchit-gandhi HF staff commited on
Commit
ae4b074
β€’
1 Parent(s): 7865806

Model save

Browse files
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sanchit-gandhi/Mistral-7B-v0.1-6-layer
3
+ tags:
4
+ - trl
5
+ - sft
6
+ - generated_from_trainer
7
+ datasets:
8
+ - generator
9
+ model-index:
10
+ - name: sanchit-gandhi/Mistral-7B-v0.1-6-layer
11
+ results: []
12
+ ---
13
+
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
16
+
17
+ # sanchit-gandhi/Mistral-7B-v0.1-6-layer
18
+
19
+ This model is a fine-tuned version of [sanchit-gandhi/Mistral-7B-v0.1-6-layer](https://huggingface.co/sanchit-gandhi/Mistral-7B-v0.1-6-layer) on the generator dataset.
20
+ It achieves the following results on the evaluation set:
21
+ - Loss: 2.1183
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 0.0003
41
+ - train_batch_size: 64
42
+ - eval_batch_size: 32
43
+ - seed: 42
44
+ - distributed_type: multi-GPU
45
+ - num_devices: 8
46
+ - total_train_batch_size: 512
47
+ - total_eval_batch_size: 256
48
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
+ - lr_scheduler_type: cosine
50
+ - lr_scheduler_warmup_ratio: 0.1
51
+ - num_epochs: 5
52
+
53
+ ### Training results
54
+
55
+ | Training Loss | Epoch | Step | Validation Loss |
56
+ |:-------------:|:-----:|:----:|:---------------:|
57
+ | 4.8342 | 1.0 | 273 | 4.7379 |
58
+ | 3.3301 | 2.0 | 546 | 3.2846 |
59
+ | 2.4158 | 3.0 | 819 | 2.4134 |
60
+ | 2.1322 | 4.0 | 1092 | 2.1637 |
61
+ | 2.0369 | 5.0 | 1365 | 2.1183 |
62
+
63
+
64
+ ### Framework versions
65
+
66
+ - Transformers 4.36.2
67
+ - Pytorch 2.1.2
68
+ - Datasets 2.14.6
69
+ - Tokenizers 0.15.0
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.0,
3
+ "eval_loss": 2.118281126022339,
4
+ "eval_runtime": 30.3009,
5
+ "eval_samples": 23110,
6
+ "eval_samples_per_second": 509.26,
7
+ "eval_steps_per_second": 2.013,
8
+ "train_loss": 3.477488596011431,
9
+ "train_runtime": 5141.5129,
10
+ "train_samples": 207865,
11
+ "train_samples_per_second": 135.588,
12
+ "train_steps_per_second": 0.265
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.0,
3
+ "eval_loss": 2.118281126022339,
4
+ "eval_runtime": 30.3009,
5
+ "eval_samples": 23110,
6
+ "eval_samples_per_second": 509.26,
7
+ "eval_steps_per_second": 2.013
8
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.36.2"
6
+ }
runs/Feb01_17-58-13_ip-26-0-165-24/events.out.tfevents.1706815561.ip-26-0-165-24.239318.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4132739882d1f8d3f4b8d46f83c2519fdce3f5de5dab0de02ecd3939cc3721a5
3
+ size 359
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 5.0,
3
+ "train_loss": 3.477488596011431,
4
+ "train_runtime": 5141.5129,
5
+ "train_samples": 207865,
6
+ "train_samples_per_second": 135.588,
7
+ "train_steps_per_second": 0.265
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,892 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 5.0,
5
+ "eval_steps": 500,
6
+ "global_step": 1365,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 2.18978102189781e-06,
14
+ "loss": 13.9701,
15
+ "step": 1
16
+ },
17
+ {
18
+ "epoch": 0.04,
19
+ "learning_rate": 2.1897810218978098e-05,
20
+ "loss": 9.8829,
21
+ "step": 10
22
+ },
23
+ {
24
+ "epoch": 0.07,
25
+ "learning_rate": 4.3795620437956196e-05,
26
+ "loss": 7.6246,
27
+ "step": 20
28
+ },
29
+ {
30
+ "epoch": 0.11,
31
+ "learning_rate": 6.56934306569343e-05,
32
+ "loss": 7.2381,
33
+ "step": 30
34
+ },
35
+ {
36
+ "epoch": 0.15,
37
+ "learning_rate": 8.759124087591239e-05,
38
+ "loss": 7.2,
39
+ "step": 40
40
+ },
41
+ {
42
+ "epoch": 0.18,
43
+ "learning_rate": 0.00010948905109489051,
44
+ "loss": 7.1787,
45
+ "step": 50
46
+ },
47
+ {
48
+ "epoch": 0.22,
49
+ "learning_rate": 0.0001313868613138686,
50
+ "loss": 7.1182,
51
+ "step": 60
52
+ },
53
+ {
54
+ "epoch": 0.26,
55
+ "learning_rate": 0.00015328467153284672,
56
+ "loss": 7.1756,
57
+ "step": 70
58
+ },
59
+ {
60
+ "epoch": 0.29,
61
+ "learning_rate": 0.00017518248175182478,
62
+ "loss": 6.9204,
63
+ "step": 80
64
+ },
65
+ {
66
+ "epoch": 0.33,
67
+ "learning_rate": 0.0001970802919708029,
68
+ "loss": 6.739,
69
+ "step": 90
70
+ },
71
+ {
72
+ "epoch": 0.37,
73
+ "learning_rate": 0.00021897810218978101,
74
+ "loss": 6.6124,
75
+ "step": 100
76
+ },
77
+ {
78
+ "epoch": 0.4,
79
+ "learning_rate": 0.0002408759124087591,
80
+ "loss": 6.5668,
81
+ "step": 110
82
+ },
83
+ {
84
+ "epoch": 0.44,
85
+ "learning_rate": 0.0002627737226277372,
86
+ "loss": 6.4359,
87
+ "step": 120
88
+ },
89
+ {
90
+ "epoch": 0.48,
91
+ "learning_rate": 0.0002846715328467153,
92
+ "loss": 6.5331,
93
+ "step": 130
94
+ },
95
+ {
96
+ "epoch": 0.51,
97
+ "learning_rate": 0.00029999558221422155,
98
+ "loss": 6.4523,
99
+ "step": 140
100
+ },
101
+ {
102
+ "epoch": 0.55,
103
+ "learning_rate": 0.00029991705103933765,
104
+ "loss": 6.2601,
105
+ "step": 150
106
+ },
107
+ {
108
+ "epoch": 0.59,
109
+ "learning_rate": 0.00029974040600590614,
110
+ "loss": 6.1484,
111
+ "step": 160
112
+ },
113
+ {
114
+ "epoch": 0.62,
115
+ "learning_rate": 0.0002994657627200285,
116
+ "loss": 6.0018,
117
+ "step": 170
118
+ },
119
+ {
120
+ "epoch": 0.66,
121
+ "learning_rate": 0.0002990933009231839,
122
+ "loss": 5.8596,
123
+ "step": 180
124
+ },
125
+ {
126
+ "epoch": 0.7,
127
+ "learning_rate": 0.0002986232643745964,
128
+ "loss": 5.7249,
129
+ "step": 190
130
+ },
131
+ {
132
+ "epoch": 0.73,
133
+ "learning_rate": 0.000298055960691706,
134
+ "loss": 5.5685,
135
+ "step": 200
136
+ },
137
+ {
138
+ "epoch": 0.77,
139
+ "learning_rate": 0.0002973917611488469,
140
+ "loss": 5.4467,
141
+ "step": 210
142
+ },
143
+ {
144
+ "epoch": 0.81,
145
+ "learning_rate": 0.0002966311004342651,
146
+ "loss": 5.3091,
147
+ "step": 220
148
+ },
149
+ {
150
+ "epoch": 0.84,
151
+ "learning_rate": 0.0002957744763656356,
152
+ "loss": 5.1977,
153
+ "step": 230
154
+ },
155
+ {
156
+ "epoch": 0.88,
157
+ "learning_rate": 0.00029482244956426253,
158
+ "loss": 5.0377,
159
+ "step": 240
160
+ },
161
+ {
162
+ "epoch": 0.92,
163
+ "learning_rate": 0.0002937756430881789,
164
+ "loss": 4.9283,
165
+ "step": 250
166
+ },
167
+ {
168
+ "epoch": 0.95,
169
+ "learning_rate": 0.0002926347420243833,
170
+ "loss": 5.091,
171
+ "step": 260
172
+ },
173
+ {
174
+ "epoch": 0.99,
175
+ "learning_rate": 0.0002914004930404816,
176
+ "loss": 4.8342,
177
+ "step": 270
178
+ },
179
+ {
180
+ "epoch": 1.0,
181
+ "eval_loss": 4.737916946411133,
182
+ "eval_runtime": 30.7851,
183
+ "eval_samples_per_second": 501.248,
184
+ "eval_steps_per_second": 1.981,
185
+ "step": 273
186
+ },
187
+ {
188
+ "epoch": 1.03,
189
+ "learning_rate": 0.00029007370389602736,
190
+ "loss": 4.7528,
191
+ "step": 280
192
+ },
193
+ {
194
+ "epoch": 1.06,
195
+ "learning_rate": 0.00028865524291388006,
196
+ "loss": 4.6222,
197
+ "step": 290
198
+ },
199
+ {
200
+ "epoch": 1.1,
201
+ "learning_rate": 0.0002871460384119274,
202
+ "loss": 4.4689,
203
+ "step": 300
204
+ },
205
+ {
206
+ "epoch": 1.14,
207
+ "learning_rate": 0.00028554707809554385,
208
+ "loss": 4.3523,
209
+ "step": 310
210
+ },
211
+ {
212
+ "epoch": 1.17,
213
+ "learning_rate": 0.0002838594084111824,
214
+ "loss": 4.3172,
215
+ "step": 320
216
+ },
217
+ {
218
+ "epoch": 1.21,
219
+ "learning_rate": 0.00028208413386152326,
220
+ "loss": 4.1787,
221
+ "step": 330
222
+ },
223
+ {
224
+ "epoch": 1.25,
225
+ "learning_rate": 0.00028022241628262735,
226
+ "loss": 4.1082,
227
+ "step": 340
228
+ },
229
+ {
230
+ "epoch": 1.28,
231
+ "learning_rate": 0.00027827547408356773,
232
+ "loss": 3.9914,
233
+ "step": 350
234
+ },
235
+ {
236
+ "epoch": 1.32,
237
+ "learning_rate": 0.00027624458144903663,
238
+ "loss": 4.0363,
239
+ "step": 360
240
+ },
241
+ {
242
+ "epoch": 1.36,
243
+ "learning_rate": 0.0002741310675054493,
244
+ "loss": 5.2644,
245
+ "step": 370
246
+ },
247
+ {
248
+ "epoch": 1.39,
249
+ "learning_rate": 0.0002719363154510924,
250
+ "loss": 4.8735,
251
+ "step": 380
252
+ },
253
+ {
254
+ "epoch": 1.43,
255
+ "learning_rate": 0.000269661761650883,
256
+ "loss": 4.4014,
257
+ "step": 390
258
+ },
259
+ {
260
+ "epoch": 1.47,
261
+ "learning_rate": 0.00026730889469633406,
262
+ "loss": 4.1602,
263
+ "step": 400
264
+ },
265
+ {
266
+ "epoch": 1.5,
267
+ "learning_rate": 0.0002648792544313389,
268
+ "loss": 4.0027,
269
+ "step": 410
270
+ },
271
+ {
272
+ "epoch": 1.54,
273
+ "learning_rate": 0.0002623744309444141,
274
+ "loss": 3.9095,
275
+ "step": 420
276
+ },
277
+ {
278
+ "epoch": 1.58,
279
+ "learning_rate": 0.0002597960635280588,
280
+ "loss": 3.8225,
281
+ "step": 430
282
+ },
283
+ {
284
+ "epoch": 1.61,
285
+ "learning_rate": 0.00025714583960591324,
286
+ "loss": 3.7638,
287
+ "step": 440
288
+ },
289
+ {
290
+ "epoch": 1.65,
291
+ "learning_rate": 0.0002544254936284164,
292
+ "loss": 3.7038,
293
+ "step": 450
294
+ },
295
+ {
296
+ "epoch": 1.68,
297
+ "learning_rate": 0.0002516368059376883,
298
+ "loss": 3.6738,
299
+ "step": 460
300
+ },
301
+ {
302
+ "epoch": 1.72,
303
+ "learning_rate": 0.00024878160160237653,
304
+ "loss": 3.6142,
305
+ "step": 470
306
+ },
307
+ {
308
+ "epoch": 1.76,
309
+ "learning_rate": 0.00024586174922323293,
310
+ "loss": 3.5565,
311
+ "step": 480
312
+ },
313
+ {
314
+ "epoch": 1.79,
315
+ "learning_rate": 0.0002428791597101996,
316
+ "loss": 3.5071,
317
+ "step": 490
318
+ },
319
+ {
320
+ "epoch": 1.83,
321
+ "learning_rate": 0.00023983578503180541,
322
+ "loss": 3.4761,
323
+ "step": 500
324
+ },
325
+ {
326
+ "epoch": 1.87,
327
+ "learning_rate": 0.00023673361693769216,
328
+ "loss": 3.4575,
329
+ "step": 510
330
+ },
331
+ {
332
+ "epoch": 1.9,
333
+ "learning_rate": 0.00023357468565510535,
334
+ "loss": 3.4062,
335
+ "step": 520
336
+ },
337
+ {
338
+ "epoch": 1.94,
339
+ "learning_rate": 0.00023036105856020315,
340
+ "loss": 3.3653,
341
+ "step": 530
342
+ },
343
+ {
344
+ "epoch": 1.98,
345
+ "learning_rate": 0.00022709483882505315,
346
+ "loss": 3.3301,
347
+ "step": 540
348
+ },
349
+ {
350
+ "epoch": 2.0,
351
+ "eval_loss": 3.2846388816833496,
352
+ "eval_runtime": 30.5512,
353
+ "eval_samples_per_second": 505.086,
354
+ "eval_steps_per_second": 1.997,
355
+ "step": 546
356
+ },
357
+ {
358
+ "epoch": 2.01,
359
+ "learning_rate": 0.00022377816404120263,
360
+ "loss": 3.2758,
361
+ "step": 550
362
+ },
363
+ {
364
+ "epoch": 2.05,
365
+ "learning_rate": 0.00022041320482072218,
366
+ "loss": 3.2522,
367
+ "step": 560
368
+ },
369
+ {
370
+ "epoch": 2.09,
371
+ "learning_rate": 0.00021700216337563975,
372
+ "loss": 3.1993,
373
+ "step": 570
374
+ },
375
+ {
376
+ "epoch": 2.12,
377
+ "learning_rate": 0.00021354727207669315,
378
+ "loss": 3.147,
379
+ "step": 580
380
+ },
381
+ {
382
+ "epoch": 2.16,
383
+ "learning_rate": 0.00021005079199234558,
384
+ "loss": 3.1192,
385
+ "step": 590
386
+ },
387
+ {
388
+ "epoch": 2.2,
389
+ "learning_rate": 0.00020651501140901961,
390
+ "loss": 3.0901,
391
+ "step": 600
392
+ },
393
+ {
394
+ "epoch": 2.23,
395
+ "learning_rate": 0.0002029422443335184,
396
+ "loss": 3.0812,
397
+ "step": 610
398
+ },
399
+ {
400
+ "epoch": 2.27,
401
+ "learning_rate": 0.00019933482897861385,
402
+ "loss": 3.0369,
403
+ "step": 620
404
+ },
405
+ {
406
+ "epoch": 2.31,
407
+ "learning_rate": 0.00019569512623279333,
408
+ "loss": 2.9916,
409
+ "step": 630
410
+ },
411
+ {
412
+ "epoch": 2.34,
413
+ "learning_rate": 0.00019202551811516592,
414
+ "loss": 2.9367,
415
+ "step": 640
416
+ },
417
+ {
418
+ "epoch": 2.38,
419
+ "learning_rate": 0.00018832840621653993,
420
+ "loss": 2.9235,
421
+ "step": 650
422
+ },
423
+ {
424
+ "epoch": 2.42,
425
+ "learning_rate": 0.00018460621012769126,
426
+ "loss": 3.0402,
427
+ "step": 660
428
+ },
429
+ {
430
+ "epoch": 2.45,
431
+ "learning_rate": 0.0001808613658558521,
432
+ "loss": 2.9328,
433
+ "step": 670
434
+ },
435
+ {
436
+ "epoch": 2.49,
437
+ "learning_rate": 0.00017709632423045527,
438
+ "loss": 2.8384,
439
+ "step": 680
440
+ },
441
+ {
442
+ "epoch": 2.53,
443
+ "learning_rate": 0.0001733135492991784,
444
+ "loss": 2.7372,
445
+ "step": 690
446
+ },
447
+ {
448
+ "epoch": 2.56,
449
+ "learning_rate": 0.00016951551671533753,
450
+ "loss": 2.7189,
451
+ "step": 700
452
+ },
453
+ {
454
+ "epoch": 2.6,
455
+ "learning_rate": 0.00016570471211768486,
456
+ "loss": 2.6697,
457
+ "step": 710
458
+ },
459
+ {
460
+ "epoch": 2.64,
461
+ "learning_rate": 0.00016188362950367204,
462
+ "loss": 2.6319,
463
+ "step": 720
464
+ },
465
+ {
466
+ "epoch": 2.67,
467
+ "learning_rate": 0.00015805476959724273,
468
+ "loss": 2.5963,
469
+ "step": 730
470
+ },
471
+ {
472
+ "epoch": 2.71,
473
+ "learning_rate": 0.00015422063821222292,
474
+ "loss": 2.5732,
475
+ "step": 740
476
+ },
477
+ {
478
+ "epoch": 2.75,
479
+ "learning_rate": 0.00015038374461238062,
480
+ "loss": 2.5426,
481
+ "step": 750
482
+ },
483
+ {
484
+ "epoch": 2.78,
485
+ "learning_rate": 0.00014654659986922697,
486
+ "loss": 2.5217,
487
+ "step": 760
488
+ },
489
+ {
490
+ "epoch": 2.82,
491
+ "learning_rate": 0.00014271171521863514,
492
+ "loss": 2.4971,
493
+ "step": 770
494
+ },
495
+ {
496
+ "epoch": 2.86,
497
+ "learning_rate": 0.00013888160041735086,
498
+ "loss": 2.4917,
499
+ "step": 780
500
+ },
501
+ {
502
+ "epoch": 2.89,
503
+ "learning_rate": 0.0001350587621004716,
504
+ "loss": 2.4795,
505
+ "step": 790
506
+ },
507
+ {
508
+ "epoch": 2.93,
509
+ "learning_rate": 0.00013124570214096816,
510
+ "loss": 2.4464,
511
+ "step": 800
512
+ },
513
+ {
514
+ "epoch": 2.97,
515
+ "learning_rate": 0.00012744491601232355,
516
+ "loss": 2.4158,
517
+ "step": 810
518
+ },
519
+ {
520
+ "epoch": 3.0,
521
+ "eval_loss": 2.413381576538086,
522
+ "eval_runtime": 30.6479,
523
+ "eval_samples_per_second": 503.492,
524
+ "eval_steps_per_second": 1.99,
525
+ "step": 819
526
+ },
527
+ {
528
+ "epoch": 3.0,
529
+ "learning_rate": 0.00012365889115535916,
530
+ "loss": 2.402,
531
+ "step": 820
532
+ },
533
+ {
534
+ "epoch": 3.04,
535
+ "learning_rate": 0.00011989010535031889,
536
+ "loss": 2.3491,
537
+ "step": 830
538
+ },
539
+ {
540
+ "epoch": 3.08,
541
+ "learning_rate": 0.00011614102509527481,
542
+ "loss": 2.3247,
543
+ "step": 840
544
+ },
545
+ {
546
+ "epoch": 3.11,
547
+ "learning_rate": 0.00011241410399191728,
548
+ "loss": 2.3179,
549
+ "step": 850
550
+ },
551
+ {
552
+ "epoch": 3.15,
553
+ "learning_rate": 0.00010871178113978432,
554
+ "loss": 2.3006,
555
+ "step": 860
556
+ },
557
+ {
558
+ "epoch": 3.19,
559
+ "learning_rate": 0.00010503647953998295,
560
+ "loss": 2.305,
561
+ "step": 870
562
+ },
563
+ {
564
+ "epoch": 3.22,
565
+ "learning_rate": 0.00010139060450944528,
566
+ "loss": 2.2922,
567
+ "step": 880
568
+ },
569
+ {
570
+ "epoch": 3.26,
571
+ "learning_rate": 9.777654210675867e-05,
572
+ "loss": 2.2766,
573
+ "step": 890
574
+ },
575
+ {
576
+ "epoch": 3.3,
577
+ "learning_rate": 9.419665757059952e-05,
578
+ "loss": 2.2732,
579
+ "step": 900
580
+ },
581
+ {
582
+ "epoch": 3.33,
583
+ "learning_rate": 9.065329377179248e-05,
584
+ "loss": 2.2591,
585
+ "step": 910
586
+ },
587
+ {
588
+ "epoch": 3.37,
589
+ "learning_rate": 8.714876968000853e-05,
590
+ "loss": 2.2477,
591
+ "step": 920
592
+ },
593
+ {
594
+ "epoch": 3.41,
595
+ "learning_rate": 8.368537884610555e-05,
596
+ "loss": 2.243,
597
+ "step": 930
598
+ },
599
+ {
600
+ "epoch": 3.44,
601
+ "learning_rate": 8.026538790110405e-05,
602
+ "loss": 2.2341,
603
+ "step": 940
604
+ },
605
+ {
606
+ "epoch": 3.48,
607
+ "learning_rate": 7.689103507278047e-05,
608
+ "loss": 2.2249,
609
+ "step": 950
610
+ },
611
+ {
612
+ "epoch": 3.52,
613
+ "learning_rate": 7.356452872084971e-05,
614
+ "loss": 2.236,
615
+ "step": 960
616
+ },
617
+ {
618
+ "epoch": 3.55,
619
+ "learning_rate": 7.028804589169443e-05,
620
+ "loss": 2.2097,
621
+ "step": 970
622
+ },
623
+ {
624
+ "epoch": 3.59,
625
+ "learning_rate": 6.706373089358791e-05,
626
+ "loss": 2.1968,
627
+ "step": 980
628
+ },
629
+ {
630
+ "epoch": 3.63,
631
+ "learning_rate": 6.389369389334193e-05,
632
+ "loss": 2.187,
633
+ "step": 990
634
+ },
635
+ {
636
+ "epoch": 3.66,
637
+ "learning_rate": 6.0780009535299393e-05,
638
+ "loss": 2.1865,
639
+ "step": 1000
640
+ },
641
+ {
642
+ "epoch": 3.7,
643
+ "learning_rate": 5.772471558357407e-05,
644
+ "loss": 2.1732,
645
+ "step": 1010
646
+ },
647
+ {
648
+ "epoch": 3.74,
649
+ "learning_rate": 5.4729811588427536e-05,
650
+ "loss": 2.1648,
651
+ "step": 1020
652
+ },
653
+ {
654
+ "epoch": 3.77,
655
+ "learning_rate": 5.179725757765449e-05,
656
+ "loss": 2.1696,
657
+ "step": 1030
658
+ },
659
+ {
660
+ "epoch": 3.81,
661
+ "learning_rate": 4.892897277383434e-05,
662
+ "loss": 2.1591,
663
+ "step": 1040
664
+ },
665
+ {
666
+ "epoch": 3.85,
667
+ "learning_rate": 4.6126834338287713e-05,
668
+ "loss": 2.1536,
669
+ "step": 1050
670
+ },
671
+ {
672
+ "epoch": 3.88,
673
+ "learning_rate": 4.339267614256027e-05,
674
+ "loss": 2.1536,
675
+ "step": 1060
676
+ },
677
+ {
678
+ "epoch": 3.92,
679
+ "learning_rate": 4.07282875682373e-05,
680
+ "loss": 2.1404,
681
+ "step": 1070
682
+ },
683
+ {
684
+ "epoch": 3.96,
685
+ "learning_rate": 3.813541233587552e-05,
686
+ "loss": 2.1403,
687
+ "step": 1080
688
+ },
689
+ {
690
+ "epoch": 3.99,
691
+ "learning_rate": 3.561574736381752e-05,
692
+ "loss": 2.1322,
693
+ "step": 1090
694
+ },
695
+ {
696
+ "epoch": 4.0,
697
+ "eval_loss": 2.1637322902679443,
698
+ "eval_runtime": 30.5067,
699
+ "eval_samples_per_second": 505.824,
700
+ "eval_steps_per_second": 2.0,
701
+ "step": 1092
702
+ },
703
+ {
704
+ "epoch": 4.03,
705
+ "learning_rate": 3.317094165763639e-05,
706
+ "loss": 2.0822,
707
+ "step": 1100
708
+ },
709
+ {
710
+ "epoch": 4.07,
711
+ "learning_rate": 3.080259523093675e-05,
712
+ "loss": 2.0771,
713
+ "step": 1110
714
+ },
715
+ {
716
+ "epoch": 4.1,
717
+ "learning_rate": 2.8512258058219112e-05,
718
+ "loss": 2.0782,
719
+ "step": 1120
720
+ },
721
+ {
722
+ "epoch": 4.14,
723
+ "learning_rate": 2.6301429060492306e-05,
724
+ "loss": 2.0688,
725
+ "step": 1130
726
+ },
727
+ {
728
+ "epoch": 4.18,
729
+ "learning_rate": 2.417155512429832e-05,
730
+ "loss": 2.0603,
731
+ "step": 1140
732
+ },
733
+ {
734
+ "epoch": 4.21,
735
+ "learning_rate": 2.2124030154791035e-05,
736
+ "loss": 2.0602,
737
+ "step": 1150
738
+ },
739
+ {
740
+ "epoch": 4.25,
741
+ "learning_rate": 2.0160194163489062e-05,
742
+ "loss": 2.0603,
743
+ "step": 1160
744
+ },
745
+ {
746
+ "epoch": 4.29,
747
+ "learning_rate": 1.828133239129944e-05,
748
+ "loss": 2.0617,
749
+ "step": 1170
750
+ },
751
+ {
752
+ "epoch": 4.32,
753
+ "learning_rate": 1.6488674467386278e-05,
754
+ "loss": 2.065,
755
+ "step": 1180
756
+ },
757
+ {
758
+ "epoch": 4.36,
759
+ "learning_rate": 1.47833936044345e-05,
760
+ "loss": 2.0479,
761
+ "step": 1190
762
+ },
763
+ {
764
+ "epoch": 4.4,
765
+ "learning_rate": 1.3166605830835903e-05,
766
+ "loss": 2.0553,
767
+ "step": 1200
768
+ },
769
+ {
770
+ "epoch": 4.43,
771
+ "learning_rate": 1.1639369260299463e-05,
772
+ "loss": 2.044,
773
+ "step": 1210
774
+ },
775
+ {
776
+ "epoch": 4.47,
777
+ "learning_rate": 1.0202683399364469e-05,
778
+ "loss": 2.0539,
779
+ "step": 1220
780
+ },
781
+ {
782
+ "epoch": 4.51,
783
+ "learning_rate": 8.857488493268839e-06,
784
+ "loss": 2.0471,
785
+ "step": 1230
786
+ },
787
+ {
788
+ "epoch": 4.54,
789
+ "learning_rate": 7.604664910601915e-06,
790
+ "loss": 2.0548,
791
+ "step": 1240
792
+ },
793
+ {
794
+ "epoch": 4.58,
795
+ "learning_rate": 6.445032567143238e-06,
796
+ "loss": 2.0447,
797
+ "step": 1250
798
+ },
799
+ {
800
+ "epoch": 4.62,
801
+ "learning_rate": 5.379350389265319e-06,
802
+ "loss": 2.0379,
803
+ "step": 1260
804
+ },
805
+ {
806
+ "epoch": 4.65,
807
+ "learning_rate": 4.408315817250818e-06,
808
+ "loss": 2.0351,
809
+ "step": 1270
810
+ },
811
+ {
812
+ "epoch": 4.69,
813
+ "learning_rate": 3.5325643488498757e-06,
814
+ "loss": 2.0463,
815
+ "step": 1280
816
+ },
817
+ {
818
+ "epoch": 4.73,
819
+ "learning_rate": 2.7526691233758334e-06,
820
+ "loss": 2.0436,
821
+ "step": 1290
822
+ },
823
+ {
824
+ "epoch": 4.76,
825
+ "learning_rate": 2.0691405466118307e-06,
826
+ "loss": 2.0491,
827
+ "step": 1300
828
+ },
829
+ {
830
+ "epoch": 4.8,
831
+ "learning_rate": 1.4824259567733698e-06,
832
+ "loss": 2.0461,
833
+ "step": 1310
834
+ },
835
+ {
836
+ "epoch": 4.84,
837
+ "learning_rate": 9.929093317461057e-07,
838
+ "loss": 2.041,
839
+ "step": 1320
840
+ },
841
+ {
842
+ "epoch": 4.87,
843
+ "learning_rate": 6.009110377897086e-07,
844
+ "loss": 2.04,
845
+ "step": 1330
846
+ },
847
+ {
848
+ "epoch": 4.91,
849
+ "learning_rate": 3.066876198728474e-07,
850
+ "loss": 2.0415,
851
+ "step": 1340
852
+ },
853
+ {
854
+ "epoch": 4.95,
855
+ "learning_rate": 1.1043163377627562e-07,
856
+ "loss": 2.0418,
857
+ "step": 1350
858
+ },
859
+ {
860
+ "epoch": 4.98,
861
+ "learning_rate": 1.2271520073786623e-08,
862
+ "loss": 2.0369,
863
+ "step": 1360
864
+ },
865
+ {
866
+ "epoch": 5.0,
867
+ "eval_loss": 2.118281126022339,
868
+ "eval_runtime": 30.6172,
869
+ "eval_samples_per_second": 503.999,
870
+ "eval_steps_per_second": 1.992,
871
+ "step": 1365
872
+ },
873
+ {
874
+ "epoch": 5.0,
875
+ "step": 1365,
876
+ "total_flos": 457285168005120.0,
877
+ "train_loss": 3.477488596011431,
878
+ "train_runtime": 5141.5129,
879
+ "train_samples_per_second": 135.588,
880
+ "train_steps_per_second": 0.265
881
+ }
882
+ ],
883
+ "logging_steps": 10,
884
+ "max_steps": 1365,
885
+ "num_input_tokens_seen": 0,
886
+ "num_train_epochs": 5,
887
+ "save_steps": 500,
888
+ "total_flos": 457285168005120.0,
889
+ "train_batch_size": 64,
890
+ "trial_name": null,
891
+ "trial_params": null
892
+ }
wandb/debug-internal.log CHANGED
@@ -4459,3 +4459,45 @@
4459
  2024-02-01 19:25:27,251 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4460
  2024-02-01 19:25:27,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4461
  2024-02-01 19:25:27,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4459
  2024-02-01 19:25:27,251 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4460
  2024-02-01 19:25:27,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4461
  2024-02-01 19:25:27,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
4462
+ 2024-02-01 19:25:32,118 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4463
+ 2024-02-01 19:25:33,575 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4464
+ 2024-02-01 19:25:35,577 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4465
+ 2024-02-01 19:25:37,559 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4466
+ 2024-02-01 19:25:37,580 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4467
+ 2024-02-01 19:25:39,583 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4468
+ 2024-02-01 19:25:41,585 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4469
+ 2024-02-01 19:25:42,251 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4470
+ 2024-02-01 19:25:42,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4471
+ 2024-02-01 19:25:42,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
4472
+ 2024-02-01 19:25:43,038 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4473
+ 2024-02-01 19:25:43,588 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4474
+ 2024-02-01 19:25:45,590 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4475
+ 2024-02-01 19:25:47,593 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4476
+ 2024-02-01 19:25:48,502 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4477
+ 2024-02-01 19:25:49,595 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4478
+ 2024-02-01 19:25:51,005 DEBUG SenderThread:239784 [sender.py:send():382] send: stats
4479
+ 2024-02-01 19:25:51,598 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4480
+ 2024-02-01 19:25:53,600 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4481
+ 2024-02-01 19:25:53,968 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4482
+ 2024-02-01 19:25:55,603 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4483
+ 2024-02-01 19:25:57,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4484
+ 2024-02-01 19:25:57,253 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4485
+ 2024-02-01 19:25:57,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
4486
+ 2024-02-01 19:25:57,605 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4487
+ 2024-02-01 19:25:59,433 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4488
+ 2024-02-01 19:25:59,608 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4489
+ 2024-02-01 19:26:01,571 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: partial_history
4490
+ 2024-02-01 19:26:01,573 DEBUG SenderThread:239784 [sender.py:send():382] send: history
4491
+ 2024-02-01 19:26:01,573 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: summary_record
4492
+ 2024-02-01 19:26:01,575 INFO SenderThread:239784 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
4493
+ 2024-02-01 19:26:01,611 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4494
+ 2024-02-01 19:26:01,611 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/wandb-summary.json
4495
+ 2024-02-01 19:26:03,614 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4496
+ 2024-02-01 19:26:04,694 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4497
+ 2024-02-01 19:26:05,616 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4498
+ 2024-02-01 19:26:09,621 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4499
+ 2024-02-01 19:26:10,395 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4500
+ 2024-02-01 19:26:12,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4501
+ 2024-02-01 19:26:12,253 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4502
+ 2024-02-01 19:26:12,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
4503
+ 2024-02-01 19:26:13,626 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
wandb/run-20240201_175850-i93q0p12/files/output.log CHANGED
@@ -1682,3 +1682,52 @@
1682
  Training completed. Do not forget to share your model on huggingface.co/models =)
1683
  100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1365/1365 [1:25:34<00:00, 3.76s/it]
1684
  [INFO|trainer.py:3614] 2024-02-01 19:24:29,900 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1682
  Training completed. Do not forget to share your model on huggingface.co/models =)
1683
  100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1365/1365 [1:25:34<00:00, 3.76s/it]
1684
  [INFO|trainer.py:3614] 2024-02-01 19:24:29,900 >> Waiting for the current checkpoint push to be finished, this might take a couple of minutes.
1685
+ {'train_runtime': 5141.5129, 'train_samples_per_second': 135.588, 'train_steps_per_second': 0.265, 'train_loss': 3.477488596011431, 'epoch': 5.0}
1686
+ [INFO|trainer.py:3166] 2024-02-01 19:25:31,190 >> ***** Running Evaluation *****
1687
+ [INFO|trainer.py:3168] 2024-02-01 19:25:31,190 >> Num examples = 15431
1688
+ [INFO|trainer.py:3171] 2024-02-01 19:25:31,190 >> Batch size = 32
1689
+ 3%|β–ˆβ–ˆβ–Š | 2/61 [00:00<00:14, 4.04it/s]
1690
+ ***** train metrics *****
1691
+ epoch = 5.0
1692
+ train_loss = 3.4775
1693
+ train_runtime = 1:25:41.51
1694
+ train_samples = 207865
1695
+ train_samples_per_second = 135.588
1696
+ train_steps_per_second = 0.265
1697
+
1698
+
1699
+
1700
+
1701
+
1702
+
1703
+
1704
+
1705
+
1706
+
1707
+
1708
+
1709
+
1710
+
1711
+
1712
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 61/61 [00:29<00:00, 2.04it/s]
1713
+ ***** eval metrics *****
1714
+ epoch = 5.0
1715
+ eval_loss = 2.1183
1716
+ eval_runtime = 0:00:30.30
1717
+ eval_samples = 23110
1718
+ eval_samples_per_second = 509.26
1719
+ eval_steps_per_second = 2.013
1720
+ 2024-02-01 19:26:01 - INFO - __main__ - *** Save model ***
1721
+ [INFO|trainer.py:2889] 2024-02-01 19:26:02,688 >> Saving model checkpoint to ./
1722
+ [INFO|configuration_utils.py:483] 2024-02-01 19:26:02,691 >> Configuration saved in ./config.json
1723
+ [INFO|configuration_utils.py:594] 2024-02-01 19:26:02,693 >> Configuration saved in ./generation_config.json
1724
+ [INFO|modeling_utils.py:2382] 2024-02-01 19:26:06,302 >> Model weights saved in ./pytorch_model.bin
1725
+ [INFO|tokenization_utils_base.py:2432] 2024-02-01 19:26:06,305 >> tokenizer config file saved in ./tokenizer_config.json
1726
+ [INFO|tokenization_utils_base.py:2441] 2024-02-01 19:26:06,307 >> Special tokens file saved in ./special_tokens_map.json
1727
+ [INFO|trainer.py:2889] 2024-02-01 19:26:07,389 >> Saving model checkpoint to ./
1728
+ [INFO|configuration_utils.py:483] 2024-02-01 19:26:07,392 >> Configuration saved in ./config.json
1729
+ [INFO|configuration_utils.py:594] 2024-02-01 19:26:07,394 >> Configuration saved in ./generation_config.json
1730
+ [INFO|modeling_utils.py:2382] 2024-02-01 19:26:11,028 >> Model weights saved in ./pytorch_model.bin
1731
+ [INFO|tokenization_utils_base.py:2432] 2024-02-01 19:26:11,031 >> tokenizer config file saved in ./tokenizer_config.json
1732
+ [INFO|tokenization_utils_base.py:2441] 2024-02-01 19:26:11,033 >> Special tokens file saved in ./special_tokens_map.json
1733
+ [INFO|modelcard.py:452] 2024-02-01 19:26:11,224 >> Dropping the following result as it does not have all the necessary fields:
wandb/run-20240201_175850-i93q0p12/files/wandb-summary.json CHANGED
@@ -1 +1 @@
1
- {"train/loss": 2.0369, "train/learning_rate": 1.2271520073786623e-08, "train/epoch": 5.0, "train/global_step": 1365, "_timestamp": 1706815469.711052, "_runtime": 5139.627084970474, "_step": 142, "eval/loss": 2.118281126022339, "eval/runtime": 30.6172, "eval/samples_per_second": 503.999, "eval/steps_per_second": 1.992, "train/train_runtime": 5141.5129, "train/train_samples_per_second": 135.588, "train/train_steps_per_second": 0.265, "train/total_flos": 457285168005120.0, "train/train_loss": 3.477488596011431}
 
1
+ {"train/loss": 2.0369, "train/learning_rate": 1.2271520073786623e-08, "train/epoch": 5.0, "train/global_step": 1365, "_timestamp": 1706815561.5711305, "_runtime": 5231.487163543701, "_step": 143, "eval/loss": 2.118281126022339, "eval/runtime": 30.3009, "eval/samples_per_second": 509.26, "eval/steps_per_second": 2.013, "train/train_runtime": 5141.5129, "train/train_samples_per_second": 135.588, "train/train_steps_per_second": 0.265, "train/total_flos": 457285168005120.0, "train/train_loss": 3.477488596011431}
wandb/run-20240201_175850-i93q0p12/logs/debug-internal.log CHANGED
@@ -4459,3 +4459,45 @@
4459
  2024-02-01 19:25:27,251 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4460
  2024-02-01 19:25:27,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4461
  2024-02-01 19:25:27,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4459
  2024-02-01 19:25:27,251 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4460
  2024-02-01 19:25:27,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4461
  2024-02-01 19:25:27,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
4462
+ 2024-02-01 19:25:32,118 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4463
+ 2024-02-01 19:25:33,575 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4464
+ 2024-02-01 19:25:35,577 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4465
+ 2024-02-01 19:25:37,559 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4466
+ 2024-02-01 19:25:37,580 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4467
+ 2024-02-01 19:25:39,583 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4468
+ 2024-02-01 19:25:41,585 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4469
+ 2024-02-01 19:25:42,251 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4470
+ 2024-02-01 19:25:42,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4471
+ 2024-02-01 19:25:42,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
4472
+ 2024-02-01 19:25:43,038 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4473
+ 2024-02-01 19:25:43,588 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4474
+ 2024-02-01 19:25:45,590 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4475
+ 2024-02-01 19:25:47,593 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4476
+ 2024-02-01 19:25:48,502 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4477
+ 2024-02-01 19:25:49,595 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4478
+ 2024-02-01 19:25:51,005 DEBUG SenderThread:239784 [sender.py:send():382] send: stats
4479
+ 2024-02-01 19:25:51,598 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4480
+ 2024-02-01 19:25:53,600 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4481
+ 2024-02-01 19:25:53,968 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4482
+ 2024-02-01 19:25:55,603 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4483
+ 2024-02-01 19:25:57,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4484
+ 2024-02-01 19:25:57,253 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4485
+ 2024-02-01 19:25:57,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
4486
+ 2024-02-01 19:25:57,605 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4487
+ 2024-02-01 19:25:59,433 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4488
+ 2024-02-01 19:25:59,608 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4489
+ 2024-02-01 19:26:01,571 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: partial_history
4490
+ 2024-02-01 19:26:01,573 DEBUG SenderThread:239784 [sender.py:send():382] send: history
4491
+ 2024-02-01 19:26:01,573 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: summary_record
4492
+ 2024-02-01 19:26:01,575 INFO SenderThread:239784 [sender.py:_save_file():1392] saving file wandb-summary.json with policy end
4493
+ 2024-02-01 19:26:01,611 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4494
+ 2024-02-01 19:26:01,611 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/wandb-summary.json
4495
+ 2024-02-01 19:26:03,614 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4496
+ 2024-02-01 19:26:04,694 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4497
+ 2024-02-01 19:26:05,616 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4498
+ 2024-02-01 19:26:09,621 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
4499
+ 2024-02-01 19:26:10,395 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: status_report
4500
+ 2024-02-01 19:26:12,252 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: internal_messages
4501
+ 2024-02-01 19:26:12,253 DEBUG HandlerThread:239784 [handler.py:handle_request():146] handle_request: stop_status
4502
+ 2024-02-01 19:26:12,253 DEBUG SenderThread:239784 [sender.py:send_request():409] send_request: stop_status
4503
+ 2024-02-01 19:26:13,626 INFO Thread-12 :239784 [dir_watcher.py:_on_file_modified():288] file/dir modified: /fsx/sanchit/distil-zephyr-1.5b-ssft/wandb/run-20240201_175850-i93q0p12/files/output.log
wandb/run-20240201_175850-i93q0p12/run-i93q0p12.wandb CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:83bb2becd0a41bef6a83fb4db8cc052bae33199ac7e7fd2c198b5018f4037fd9
3
- size 1540113
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18cc7a0b99001252605aad90d7a9da2c46834f3f63607f44557cc2ada6b58562
3
+ size 1573084