File size: 87,829 Bytes
2251ca6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA A10G, compute capability 8.6
llama_model_loader: loaded meta data with 16 key-value pairs and 291 tensors from llama-2-7b-q5_k_m.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q5_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    7:            blk.0.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    8:            blk.0.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    9:              blk.0.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   10:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   11:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   12:              blk.1.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   13:              blk.1.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   14:              blk.1.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   15:         blk.1.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   16:            blk.1.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   17:            blk.1.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   18:              blk.1.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   19:           blk.1.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   20:            blk.1.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   21:              blk.2.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   22:              blk.2.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   23:              blk.2.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   24:         blk.2.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   25:            blk.2.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   26:            blk.2.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   27:              blk.2.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   28:           blk.2.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   29:            blk.2.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   30:              blk.3.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   31:              blk.3.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   32:              blk.3.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   33:         blk.3.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   34:            blk.3.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   35:            blk.3.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   36:              blk.3.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   37:           blk.3.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   38:            blk.3.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   39:              blk.4.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   40:              blk.4.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   41:              blk.4.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   42:         blk.4.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   43:            blk.4.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   44:            blk.4.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   45:              blk.4.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   46:           blk.4.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   47:            blk.4.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   48:              blk.5.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   49:              blk.5.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   50:              blk.5.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   51:         blk.5.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   52:            blk.5.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   53:            blk.5.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   54:              blk.5.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   55:           blk.5.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   56:            blk.5.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   57:              blk.6.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   58:              blk.6.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   59:              blk.6.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   60:         blk.6.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   61:            blk.6.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   62:            blk.6.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   63:              blk.6.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   64:           blk.6.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   65:            blk.6.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   66:              blk.7.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   67:              blk.7.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   68:              blk.7.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   69:         blk.7.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   70:            blk.7.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   71:            blk.7.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   72:              blk.7.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   73:           blk.7.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   74:            blk.7.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   75:              blk.8.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   76:              blk.8.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   77:              blk.8.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   78:         blk.8.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   79:            blk.8.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   80:            blk.8.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   81:              blk.8.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   82:           blk.8.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   83:            blk.8.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   84:              blk.9.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   85:              blk.9.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   86:              blk.9.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   87:         blk.9.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   88:            blk.9.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   89:            blk.9.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   90:              blk.9.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   91:           blk.9.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   92:            blk.9.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   93:             blk.10.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   94:             blk.10.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   95:             blk.10.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   96:        blk.10.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   97:           blk.10.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   98:           blk.10.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   99:             blk.10.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  100:          blk.10.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  101:           blk.10.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  102:             blk.11.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  103:             blk.11.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  104:             blk.11.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  105:        blk.11.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  106:           blk.11.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  107:           blk.11.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  108:             blk.11.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  109:          blk.11.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  110:           blk.11.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  111:             blk.12.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  112:             blk.12.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  113:             blk.12.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  114:        blk.12.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  115:           blk.12.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  116:           blk.12.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  117:             blk.12.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  118:          blk.12.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  119:           blk.12.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  120:             blk.13.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  121:             blk.13.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  122:             blk.13.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  123:        blk.13.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  124:           blk.13.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  125:           blk.13.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  126:             blk.13.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  127:          blk.13.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  128:           blk.13.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  129:             blk.14.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  130:             blk.14.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  131:             blk.14.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  132:        blk.14.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  133:           blk.14.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  134:           blk.14.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  135:             blk.14.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  136:          blk.14.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  137:           blk.14.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  138:             blk.15.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  139:             blk.15.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  140:             blk.15.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  141:        blk.15.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  142:           blk.15.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  143:           blk.15.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  144:             blk.15.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  145:          blk.15.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  146:           blk.15.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  147:             blk.16.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  148:             blk.16.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  149:             blk.16.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  150:        blk.16.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  151:           blk.16.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  152:           blk.16.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  153:             blk.16.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  154:          blk.16.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  155:           blk.16.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  156:             blk.17.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  157:             blk.17.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  158:             blk.17.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  159:        blk.17.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  160:           blk.17.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  161:           blk.17.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  162:             blk.17.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  163:          blk.17.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  164:           blk.17.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  165:             blk.18.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  166:             blk.18.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  167:             blk.18.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  168:        blk.18.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  169:           blk.18.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  170:           blk.18.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  171:             blk.18.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  172:          blk.18.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  173:           blk.18.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  174:             blk.19.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  175:             blk.19.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  176:             blk.19.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  177:        blk.19.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  178:           blk.19.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  179:           blk.19.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  180:             blk.19.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  181:          blk.19.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  182:           blk.19.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  183:             blk.20.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  184:             blk.20.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  185:             blk.20.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  186:        blk.20.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  187:           blk.20.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  188:           blk.20.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  189:             blk.20.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  190:          blk.20.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  191:           blk.20.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  192:             blk.21.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  193:             blk.21.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  194:             blk.21.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  195:        blk.21.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  196:           blk.21.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  197:           blk.21.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  198:             blk.21.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  199:          blk.21.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  200:           blk.21.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  201:             blk.22.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  202:             blk.22.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  203:             blk.22.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  204:        blk.22.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  205:           blk.22.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  206:           blk.22.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  207:             blk.22.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  208:          blk.22.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  209:           blk.22.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  210:             blk.23.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  211:             blk.23.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  212:             blk.23.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  213:        blk.23.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  214:           blk.23.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  215:           blk.23.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  216:             blk.23.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  217:          blk.23.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  218:           blk.23.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  219:             blk.24.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  220:             blk.24.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  221:             blk.24.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  222:        blk.24.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  223:           blk.24.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  224:           blk.24.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  225:             blk.24.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  226:          blk.24.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  227:           blk.24.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  228:             blk.25.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  229:             blk.25.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  230:             blk.25.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  231:        blk.25.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  232:           blk.25.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  233:           blk.25.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  234:             blk.25.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  235:          blk.25.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  236:           blk.25.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  237:             blk.26.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  238:             blk.26.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  239:             blk.26.attn_v.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  240:        blk.26.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  241:           blk.26.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  242:           blk.26.ffn_down.weight q5_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  243:             blk.26.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  244:          blk.26.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  245:           blk.26.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  246:             blk.27.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  247:             blk.27.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  248:             blk.27.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  249:        blk.27.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  250:           blk.27.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  251:           blk.27.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  252:             blk.27.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  253:          blk.27.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  254:           blk.27.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  255:             blk.28.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  256:             blk.28.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  257:             blk.28.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  258:        blk.28.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  259:           blk.28.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  260:           blk.28.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  261:             blk.28.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  262:          blk.28.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  263:           blk.28.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  264:             blk.29.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  265:             blk.29.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  266:             blk.29.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  267:        blk.29.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  268:           blk.29.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  269:           blk.29.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  270:             blk.29.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  271:          blk.29.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  272:           blk.29.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  273:             blk.30.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  274:             blk.30.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  275:             blk.30.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  276:        blk.30.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  277:           blk.30.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  278:           blk.30.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  279:             blk.30.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  280:          blk.30.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  281:           blk.30.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  282:             blk.31.attn_q.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  283:             blk.31.attn_k.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  284:             blk.31.attn_v.weight q6_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  285:        blk.31.attn_output.weight q5_K     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  286:           blk.31.ffn_gate.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  287:           blk.31.ffn_down.weight q6_K     [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  288:             blk.31.ffn_up.weight q5_K     [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  289:          blk.31.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  290:           blk.31.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - kv   0:                       general.architecture str     
llama_model_loader: - kv   1:                               general.name str     
llama_model_loader: - kv   2:                       llama.context_length u32     
llama_model_loader: - kv   3:                     llama.embedding_length u32     
llama_model_loader: - kv   4:                          llama.block_count u32     
llama_model_loader: - kv   5:                  llama.feed_forward_length u32     
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32     
llama_model_loader: - kv   7:                 llama.attention.head_count u32     
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32     
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32     
llama_model_loader: - kv  10:                          general.file_type u32     
llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  15:               general.quantization_version u32     
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q5_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
llm_load_print_meta: format           = GGUF V2 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q5_K - Medium
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 4.45 GiB (5.68 BPW) 
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.10 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  = 4560.96 MB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/35 layers to GPU
llm_load_tensors: VRAM used: 0.00 MB
..................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  256.00 MB
llama_new_context_with_model: compute buffer total size = 76.63 MB
llama_new_context_with_model: VRAM scratch buffer: 70.50 MB
llama_new_context_with_model: total VRAM used: 70.50 MB (model: 0.00 MB, context: 70.50 MB)
main: seed: 1699909515
main: model base = 'llama-2-7b-q5_k_m.gguf'
main: init model
print_params: n_vocab:   32000
print_params: n_ctx:     128
print_params: n_embd:    4096
print_params: n_ff:      11008
print_params: n_head:    32
print_params: n_head_kv: 32
print_params: n_layer:   32
print_params: norm_rms_eps          : 0.000010
print_params: rope_freq_base        : 10000.000000
print_params: rope_freq_scale       : 1.000000
print_lora_params: n_rank_attention_norm : 1
print_lora_params: n_rank_wq             : 4
print_lora_params: n_rank_wk             : 4
print_lora_params: n_rank_wv             : 4
print_lora_params: n_rank_wo             : 4
print_lora_params: n_rank_ffn_norm       : 1
print_lora_params: n_rank_w1             : 4
print_lora_params: n_rank_w2             : 4
print_lora_params: n_rank_w3             : 4
print_lora_params: n_rank_tok_embeddings : 4
print_lora_params: n_rank_norm           : 1
print_lora_params: n_rank_output         : 4
main: total train_iterations 0
main: seen train_samples     0
main: seen train_tokens      0
main: completed train_epochs 0
main: lora_size = 84807904 bytes (80.9 MB)
main: opt_size  = 126592864 bytes (120.7 MB)
main: opt iter 0
main: input_size = 131076128 bytes (125.0 MB)
main: compute_size = 14064566880 bytes (13413.0 MB)
main: evaluation order = RIGHT_TO_LEFT
main: tokenize training data
tokenize_file: warning: found 2 samples (max length 197) that exceed context length of 128. samples will be cut off.
tokenize_file: warning: found 176 samples (min length 22) that are shorter than context length of 128.
tokenize_file: total number of samples: 178
main: number of training tokens: 7605
main: number of unique tokens: 868
main: train data seems to have changed. restarting shuffled epoch.
main: begin training
main: work_size = 1024512 bytes (1.0 MB)
train_opt_callback: iter=     0 sample=1/178 sched=0.000000 loss=0.000000 |->
train_opt_callback: iter=     1 sample=9/178 sched=0.010000 loss=7.328270 dt=00:22:48 eta=4d 00:57:37 |->
train_opt_callback: iter=     2 sample=17/178 sched=0.020000 loss=8.033629 dt=00:22:06 eta=3d 21:33:24 |>
train_opt_callback: iter=     3 sample=25/178 sched=0.030000 loss=7.241049 dt=00:22:01 eta=3d 20:50:38 |-->
train_opt_callback: iter=     4 sample=33/178 sched=0.040000 loss=7.332203 dt=00:22:01 eta=3d 20:31:55 |->
train_opt_callback: iter=     5 sample=41/178 sched=0.050000 loss=6.324121 dt=00:22:06 eta=3d 20:28:16 |----------->
train_opt_callback: iter=     6 sample=49/178 sched=0.060000 loss=6.477211 dt=00:21:48 eta=3d 18:50:01 |---------->
train_opt_callback: iter=     7 sample=57/178 sched=0.070000 loss=6.291273 dt=00:21:33 eta=3d 17:28:16 |----------->
train_opt_callback: iter=     8 sample=65/178 sched=0.080000 loss=5.716275 dt=00:21:31 eta=3d 16:58:39 |----------------->
train_opt_callback: iter=     9 sample=73/178 sched=0.090000 loss=5.227144 dt=00:20:14 eta=3d 11:18:46 |---------------------->
save_checkpoint_lora_file: saving to checkpoint-10.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    10 sample=81/178 sched=0.100000 loss=3.905872 dt=00:20:21 eta=3d 11:29:58 |----------------------------------->
train_opt_callback: iter=    11 sample=89/178 sched=0.110000 loss=4.364841 dt=00:20:18 eta=3d 10:54:52 |------------------------------->
train_opt_callback: iter=    12 sample=97/178 sched=0.120000 loss=3.937706 dt=00:19:51 eta=3d 08:45:23 |----------------------------------->
train_opt_callback: iter=    13 sample=105/178 sched=0.130000 loss=3.677362 dt=00:19:57 eta=3d 08:51:14 |-------------------------------------->
train_opt_callback: iter=    14 sample=113/178 sched=0.140000 loss=3.055107 dt=00:20:02 eta=3d 08:48:46 |-------------------------------------------->
train_opt_callback: iter=    15 sample=121/178 sched=0.150000 loss=2.682614 dt=00:19:53 eta=3d 07:51:55 |----------------------------------------------->
train_opt_callback: iter=    16 sample=129/178 sched=0.160000 loss=2.312799 dt=00:19:23 eta=3d 05:34:55 |--------------------------------------------------->
train_opt_callback: iter=    17 sample=137/178 sched=0.170000 loss=2.060043 dt=00:19:56 eta=3d 07:25:27 |------------------------------------------------------>
train_opt_callback: iter=    18 sample=145/178 sched=0.180000 loss=1.396019 dt=00:19:21 eta=3d 04:47:53 |------------------------------------------------------------>
train_opt_callback: iter=    19 sample=153/178 sched=0.190000 loss=1.361716 dt=00:20:04 eta=3d 07:16:02 |------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-20.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    20 sample=161/178 sched=0.200000 loss=1.122364 dt=00:20:00 eta=3d 06:42:08 |--------------------------------------------------------------->
train_opt_callback: iter=    21 sample=169/178 sched=0.210000 loss=0.910218 dt=00:19:20 eta=3d 03:46:44 |----------------------------------------------------------------->
train_opt_callback: iter=    22 sample=177/178 sched=0.220000 loss=0.621250 dt=00:19:16 eta=3d 03:11:10 |-------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 1
train_opt_callback: iter=    23 sample=1/178 sched=0.230000 loss=0.700246 dt=00:19:57 eta=3d 05:28:50 |------------------------------------------------------------------->
train_opt_callback: iter=    24 sample=9/178 sched=0.240000 loss=0.660297 dt=00:20:08 eta=3d 05:53:15 |-------------------------------------------------------------------->
train_opt_callback: iter=    25 sample=17/178 sched=0.250000 loss=0.512518 dt=00:19:38 eta=3d 03:38:30 |--------------------------------------------------------------------->
train_opt_callback: iter=    26 sample=25/178 sched=0.260000 loss=0.517694 dt=00:18:59 eta=3d 00:49:36 |--------------------------------------------------------------------->
train_opt_callback: iter=    27 sample=33/178 sched=0.270000 loss=0.490942 dt=00:19:42 eta=3d 03:14:53 |--------------------------------------------------------------------->
train_opt_callback: iter=    28 sample=41/178 sched=0.280000 loss=0.361570 dt=00:19:29 eta=3d 02:04:09 |----------------------------------------------------------------------->
train_opt_callback: iter=    29 sample=49/178 sched=0.290000 loss=0.203198 dt=00:19:54 eta=3d 03:18:15 |------------------------------------------------------------------------>
save_checkpoint_lora_file: saving to checkpoint-30.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    30 sample=57/178 sched=0.300000 loss=0.292865 dt=00:19:01 eta=2d 23:38:19 |----------------------------------------------------------------------->
train_opt_callback: iter=    31 sample=65/178 sched=0.310000 loss=0.222613 dt=00:19:52 eta=3d 02:32:32 |------------------------------------------------------------------------>
train_opt_callback: iter=    32 sample=73/178 sched=0.320000 loss=0.316802 dt=00:19:59 eta=3d 02:38:17 |----------------------------------------------------------------------->
train_opt_callback: iter=    33 sample=81/178 sched=0.330000 loss=0.215638 dt=00:20:12 eta=3d 03:05:38 |------------------------------------------------------------------------>
train_opt_callback: iter=    34 sample=89/178 sched=0.340000 loss=0.362912 dt=00:19:53 eta=3d 01:36:27 |----------------------------------------------------------------------->
train_opt_callback: iter=    35 sample=97/178 sched=0.350000 loss=0.246244 dt=00:19:59 eta=3d 01:37:07 |------------------------------------------------------------------------>
train_opt_callback: iter=    36 sample=105/178 sched=0.360000 loss=0.497322 dt=00:19:35 eta=2d 23:50:30 |--------------------------------------------------------------------->
train_opt_callback: iter=    37 sample=113/178 sched=0.370000 loss=0.196620 dt=00:20:08 eta=3d 01:29:45 |------------------------------------------------------------------------>
train_opt_callback: iter=    38 sample=121/178 sched=0.380000 loss=0.306589 dt=00:20:24 eta=3d 02:07:39 |----------------------------------------------------------------------->
train_opt_callback: iter=    39 sample=129/178 sched=0.390000 loss=0.328856 dt=00:20:04 eta=3d 00:36:32 |----------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-40.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    40 sample=137/178 sched=0.400000 loss=0.170141 dt=00:20:12 eta=3d 00:44:45 |------------------------------------------------------------------------->
train_opt_callback: iter=    41 sample=145/178 sched=0.410000 loss=0.230015 dt=00:19:48 eta=2d 22:57:37 |------------------------------------------------------------------------>
train_opt_callback: iter=    42 sample=153/178 sched=0.420000 loss=0.577738 dt=00:19:51 eta=2d 22:49:02 |--------------------------------------------------------------------->
train_opt_callback: iter=    43 sample=161/178 sched=0.430000 loss=0.129285 dt=00:19:13 eta=2d 20:15:05 |------------------------------------------------------------------------->
train_opt_callback: iter=    44 sample=169/178 sched=0.440000 loss=0.355127 dt=00:19:19 eta=2d 20:17:36 |----------------------------------------------------------------------->
train_opt_callback: iter=    45 sample=177/178 sched=0.450000 loss=0.384705 dt=00:19:16 eta=2d 19:47:27 |---------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 2
train_opt_callback: iter=    46 sample=1/178 sched=0.460000 loss=0.239497 dt=00:19:13 eta=2d 19:18:54 |------------------------------------------------------------------------>
train_opt_callback: iter=    47 sample=9/178 sched=0.470000 loss=0.228520 dt=00:18:52 eta=2d 17:45:31 |------------------------------------------------------------------------>
train_opt_callback: iter=    48 sample=17/178 sched=0.480000 loss=0.113592 dt=00:19:51 eta=2d 20:51:04 |------------------------------------------------------------------------->
train_opt_callback: iter=    49 sample=25/178 sched=0.490000 loss=0.388964 dt=00:19:32 eta=2d 19:24:00 |---------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-50.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    50 sample=33/178 sched=0.500000 loss=0.242294 dt=00:19:36 eta=2d 19:19:47 |------------------------------------------------------------------------>
train_opt_callback: iter=    51 sample=41/178 sched=0.510000 loss=0.104918 dt=00:19:19 eta=2d 18:02:58 |------------------------------------------------------------------------->
train_opt_callback: iter=    52 sample=49/178 sched=0.520000 loss=0.132959 dt=00:19:45 eta=2d 19:11:15 |------------------------------------------------------------------------->
train_opt_callback: iter=    53 sample=57/178 sched=0.530000 loss=0.270326 dt=00:19:59 eta=2d 19:36:51 |------------------------------------------------------------------------>
train_opt_callback: iter=    54 sample=65/178 sched=0.540000 loss=0.214551 dt=00:19:51 eta=2d 18:51:20 |------------------------------------------------------------------------>
train_opt_callback: iter=    55 sample=73/178 sched=0.550000 loss=0.227192 dt=00:19:42 eta=2d 18:02:10 |------------------------------------------------------------------------>
train_opt_callback: iter=    56 sample=81/178 sched=0.560000 loss=0.193324 dt=00:19:34 eta=2d 17:15:30 |------------------------------------------------------------------------>
train_opt_callback: iter=    57 sample=89/178 sched=0.570000 loss=0.092658 dt=00:19:42 eta=2d 17:21:29 |------------------------------------------------------------------------->
train_opt_callback: iter=    58 sample=97/178 sched=0.580000 loss=0.288017 dt=00:19:52 eta=2d 17:34:28 |----------------------------------------------------------------------->
train_opt_callback: iter=    59 sample=105/178 sched=0.590000 loss=0.303981 dt=00:19:22 eta=2d 15:36:34 |----------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-60.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    60 sample=113/178 sched=0.600000 loss=0.135095 dt=00:19:02 eta=2d 14:13:25 |------------------------------------------------------------------------->
train_opt_callback: iter=    61 sample=121/178 sched=0.610000 loss=0.103868 dt=00:19:55 eta=2d 16:44:49 |------------------------------------------------------------------------->
train_opt_callback: iter=    62 sample=129/178 sched=0.620000 loss=0.101314 dt=00:19:34 eta=2d 15:18:35 |------------------------------------------------------------------------->
train_opt_callback: iter=    63 sample=137/178 sched=0.630000 loss=0.206111 dt=00:19:36 eta=2d 15:05:07 |------------------------------------------------------------------------>
train_opt_callback: iter=    64 sample=145/178 sched=0.640000 loss=0.238083 dt=00:19:10 eta=2d 13:22:03 |------------------------------------------------------------------------>
train_opt_callback: iter=    65 sample=153/178 sched=0.650000 loss=0.238338 dt=00:19:21 eta=2d 13:38:19 |------------------------------------------------------------------------>
train_opt_callback: iter=    66 sample=161/178 sched=0.660000 loss=0.057109 dt=00:19:45 eta=2d 14:32:54 |-------------------------------------------------------------------------->
train_opt_callback: iter=    67 sample=169/178 sched=0.670000 loss=0.222538 dt=00:19:59 eta=2d 14:57:58 |------------------------------------------------------------------------>
train_opt_callback: iter=    68 sample=177/178 sched=0.680000 loss=0.243102 dt=00:19:50 eta=2d 14:09:54 |------------------------------------------------------------------------>
train_opt_callback: reshuffle samples. completed epochs: 3
train_opt_callback: iter=    69 sample=1/178 sched=0.690000 loss=0.221211 dt=00:19:18 eta=2d 12:12:07 |------------------------------------------------------------------------>
save_checkpoint_lora_file: saving to checkpoint-70.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    70 sample=9/178 sched=0.700000 loss=0.131074 dt=00:20:02 eta=2d 14:07:11 |------------------------------------------------------------------------->
train_opt_callback: iter=    71 sample=17/178 sched=0.710000 loss=0.196565 dt=00:20:10 eta=2d 14:11:15 |------------------------------------------------------------------------>
train_opt_callback: iter=    72 sample=25/178 sched=0.720000 loss=0.136248 dt=00:20:44 eta=2d 15:35:57 |------------------------------------------------------------------------->
train_opt_callback: iter=    73 sample=33/178 sched=0.730000 loss=0.075492 dt=00:19:47 eta=2d 12:21:42 |-------------------------------------------------------------------------->
train_opt_callback: iter=    74 sample=41/178 sched=0.740000 loss=0.062405 dt=00:20:34 eta=2d 14:24:44 |-------------------------------------------------------------------------->
train_opt_callback: iter=    75 sample=49/178 sched=0.750000 loss=0.100914 dt=00:20:27 eta=2d 13:43:23 |------------------------------------------------------------------------->
train_opt_callback: iter=    76 sample=57/178 sched=0.760000 loss=0.179218 dt=00:20:04 eta=2d 12:14:05 |------------------------------------------------------------------------>
train_opt_callback: iter=    77 sample=65/178 sched=0.770000 loss=0.087842 dt=00:20:00 eta=2d 11:40:49 |------------------------------------------------------------------------->
train_opt_callback: iter=    78 sample=73/178 sched=0.780000 loss=0.068578 dt=00:20:27 eta=2d 12:41:37 |-------------------------------------------------------------------------->
train_opt_callback: iter=    79 sample=81/178 sched=0.790000 loss=0.261990 dt=00:20:28 eta=2d 12:23:13 |------------------------------------------------------------------------>
save_checkpoint_lora_file: saving to checkpoint-80.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    80 sample=89/178 sched=0.800000 loss=0.069881 dt=00:20:09 eta=2d 11:06:42 |-------------------------------------------------------------------------->
train_opt_callback: iter=    81 sample=97/178 sched=0.810000 loss=0.109635 dt=00:19:46 eta=2d 09:41:27 |------------------------------------------------------------------------->
train_opt_callback: iter=    82 sample=105/178 sched=0.820000 loss=0.272391 dt=00:19:51 eta=2d 09:34:54 |------------------------------------------------------------------------>
train_opt_callback: iter=    83 sample=113/178 sched=0.830000 loss=0.148119 dt=00:20:06 eta=2d 09:58:26 |------------------------------------------------------------------------->
train_opt_callback: iter=    84 sample=121/178 sched=0.840000 loss=0.086822 dt=00:20:35 eta=2d 11:03:01 |------------------------------------------------------------------------->
train_opt_callback: iter=    85 sample=129/178 sched=0.850000 loss=0.163737 dt=00:20:25 eta=2d 10:13:42 |------------------------------------------------------------------------->
train_opt_callback: iter=    86 sample=137/178 sched=0.860000 loss=0.120277 dt=00:19:35 eta=2d 07:31:11 |------------------------------------------------------------------------->
train_opt_callback: iter=    87 sample=145/178 sched=0.870000 loss=0.109836 dt=00:19:36 eta=2d 07:14:41 |------------------------------------------------------------------------->
train_opt_callback: iter=    88 sample=153/178 sched=0.880000 loss=0.084462 dt=00:24:40 eta=2d 21:06:45 |------------------------------------------------------------------------->
train_opt_callback: iter=    89 sample=161/178 sched=0.890000 loss=0.116081 dt=00:33:20 eta=3d 20:48:09 |------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-90.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=    90 sample=169/178 sched=0.900000 loss=0.148298 dt=00:35:32 eta=4d 02:18:40 |------------------------------------------------------------------------->
train_opt_callback: iter=    91 sample=177/178 sched=0.910000 loss=0.150541 dt=00:36:14 eta=4d 03:40:13 |------------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 4
train_opt_callback: iter=    92 sample=1/178 sched=0.920000 loss=0.141938 dt=00:36:22 eta=4d 03:24:30 |------------------------------------------------------------------------->
train_opt_callback: iter=    93 sample=9/178 sched=0.930000 loss=0.110087 dt=00:36:44 eta=4d 03:50:07 |------------------------------------------------------------------------->
train_opt_callback: iter=    94 sample=17/178 sched=0.940000 loss=0.096049 dt=00:36:50 eta=4d 03:27:39 |------------------------------------------------------------------------->
train_opt_callback: iter=    95 sample=25/178 sched=0.950000 loss=0.062458 dt=00:36:49 eta=4d 02:47:38 |-------------------------------------------------------------------------->
train_opt_callback: iter=    96 sample=33/178 sched=0.960000 loss=0.050658 dt=00:36:30 eta=4d 01:22:08 |-------------------------------------------------------------------------->
train_opt_callback: iter=    97 sample=41/178 sched=0.970000 loss=0.131350 dt=00:36:30 eta=4d 00:44:21 |------------------------------------------------------------------------->
train_opt_callback: iter=    98 sample=49/178 sched=0.980000 loss=0.068363 dt=00:36:25 eta=3d 23:55:10 |-------------------------------------------------------------------------->
train_opt_callback: iter=    99 sample=57/178 sched=0.990000 loss=0.066281 dt=00:36:32 eta=3d 23:36:49 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-100.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   100 sample=65/178 sched=0.977975 loss=0.104312 dt=00:36:39 eta=3d 23:19:47 |------------------------------------------------------------------------->
train_opt_callback: iter=   101 sample=73/178 sched=0.977536 loss=0.067689 dt=00:36:36 eta=3d 22:35:05 |-------------------------------------------------------------------------->
train_opt_callback: iter=   102 sample=81/178 sched=0.977093 loss=0.048579 dt=00:36:25 eta=3d 21:28:57 |-------------------------------------------------------------------------->
train_opt_callback: iter=   103 sample=89/178 sched=0.976646 loss=0.077060 dt=00:36:20 eta=3d 20:39:11 |-------------------------------------------------------------------------->
train_opt_callback: iter=   104 sample=97/178 sched=0.976194 loss=0.070578 dt=00:36:27 eta=3d 20:20:47 |-------------------------------------------------------------------------->
train_opt_callback: iter=   105 sample=105/178 sched=0.975738 loss=0.086573 dt=00:36:33 eta=3d 20:01:28 |------------------------------------------------------------------------->
train_opt_callback: iter=   106 sample=113/178 sched=0.975278 loss=0.094845 dt=00:36:18 eta=3d 18:46:37 |------------------------------------------------------------------------->
train_opt_callback: iter=   107 sample=121/178 sched=0.974814 loss=0.104385 dt=00:36:35 eta=3d 18:52:51 |------------------------------------------------------------------------->
train_opt_callback: iter=   108 sample=129/178 sched=0.974346 loss=0.079805 dt=00:36:33 eta=3d 18:11:51 |------------------------------------------------------------------------->
train_opt_callback: iter=   109 sample=137/178 sched=0.973873 loss=0.115604 dt=00:36:27 eta=3d 17:18:30 |------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-110.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   110 sample=145/178 sched=0.973396 loss=0.106835 dt=00:36:14 eta=3d 16:12:18 |------------------------------------------------------------------------->
train_opt_callback: iter=   111 sample=153/178 sched=0.972915 loss=0.108404 dt=00:36:22 eta=3d 15:55:32 |------------------------------------------------------------------------->
train_opt_callback: iter=   112 sample=161/178 sched=0.972430 loss=0.150818 dt=00:36:43 eta=3d 16:08:57 |------------------------------------------------------------------------->
train_opt_callback: iter=   113 sample=169/178 sched=0.971941 loss=0.111072 dt=00:36:30 eta=3d 15:00:24 |------------------------------------------------------------------------->
train_opt_callback: iter=   114 sample=177/178 sched=0.971447 loss=0.176334 dt=00:36:31 eta=3d 14:27:00 |------------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 5
train_opt_callback: iter=   115 sample=1/178 sched=0.970950 loss=0.071819 dt=00:36:31 eta=3d 13:49:09 |-------------------------------------------------------------------------->
train_opt_callback: iter=   116 sample=9/178 sched=0.970448 loss=0.089301 dt=00:36:39 eta=3d 13:32:39 |------------------------------------------------------------------------->
train_opt_callback: iter=   117 sample=17/178 sched=0.969942 loss=0.069022 dt=00:36:31 eta=3d 12:37:25 |-------------------------------------------------------------------------->
train_opt_callback: iter=   118 sample=25/178 sched=0.969432 loss=0.088029 dt=00:36:37 eta=3d 12:14:04 |------------------------------------------------------------------------->
train_opt_callback: iter=   119 sample=33/178 sched=0.968918 loss=0.103543 dt=00:36:30 eta=3d 11:21:17 |------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-120.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   120 sample=41/178 sched=0.968399 loss=0.080645 dt=00:36:33 eta=3d 10:51:58 |------------------------------------------------------------------------->
train_opt_callback: iter=   121 sample=49/178 sched=0.967877 loss=0.059648 dt=00:36:21 eta=3d 09:48:08 |-------------------------------------------------------------------------->
train_opt_callback: iter=   122 sample=57/178 sched=0.967350 loss=0.091482 dt=00:36:19 eta=3d 09:07:40 |------------------------------------------------------------------------->
train_opt_callback: iter=   123 sample=65/178 sched=0.966820 loss=0.057773 dt=00:36:28 eta=3d 08:51:08 |-------------------------------------------------------------------------->
train_opt_callback: iter=   124 sample=73/178 sched=0.966285 loss=0.091841 dt=00:36:34 eta=3d 08:26:52 |------------------------------------------------------------------------->
train_opt_callback: iter=   125 sample=81/178 sched=0.965746 loss=0.077759 dt=00:36:18 eta=3d 07:15:57 |-------------------------------------------------------------------------->
train_opt_callback: iter=   126 sample=89/178 sched=0.965203 loss=0.084306 dt=00:36:27 eta=3d 06:59:43 |------------------------------------------------------------------------->
train_opt_callback: iter=   127 sample=97/178 sched=0.964656 loss=0.062791 dt=00:36:34 eta=3d 06:37:24 |-------------------------------------------------------------------------->
train_opt_callback: iter=   128 sample=105/178 sched=0.964104 loss=0.065764 dt=00:36:47 eta=3d 06:30:19 |-------------------------------------------------------------------------->
train_opt_callback: iter=   129 sample=113/178 sched=0.963549 loss=0.063520 dt=00:37:09 eta=3d 06:38:29 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-130.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   130 sample=121/178 sched=0.962990 loss=0.091305 dt=00:37:05 eta=3d 05:53:43 |------------------------------------------------------------------------->
train_opt_callback: iter=   131 sample=129/178 sched=0.962426 loss=0.076932 dt=00:36:59 eta=3d 05:04:49 |-------------------------------------------------------------------------->
train_opt_callback: iter=   132 sample=137/178 sched=0.961859 loss=0.077943 dt=00:36:58 eta=3d 04:25:42 |-------------------------------------------------------------------------->
train_opt_callback: iter=   133 sample=145/178 sched=0.961287 loss=0.078317 dt=00:36:36 eta=3d 03:02:19 |------------------------------------------------------------------------->
train_opt_callback: iter=   134 sample=153/178 sched=0.960711 loss=0.099851 dt=00:36:49 eta=3d 02:51:55 |------------------------------------------------------------------------->
train_opt_callback: iter=   135 sample=161/178 sched=0.960131 loss=0.066786 dt=00:37:01 eta=3d 02:40:55 |-------------------------------------------------------------------------->
train_opt_callback: iter=   136 sample=169/178 sched=0.959548 loss=0.068781 dt=00:36:53 eta=3d 01:46:21 |-------------------------------------------------------------------------->
train_opt_callback: iter=   137 sample=177/178 sched=0.958960 loss=0.083668 dt=00:36:34 eta=3d 00:33:19 |------------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 6
train_opt_callback: iter=   138 sample=1/178 sched=0.958368 loss=0.062936 dt=00:36:48 eta=3d 00:22:24 |-------------------------------------------------------------------------->
train_opt_callback: iter=   139 sample=9/178 sched=0.957772 loss=0.062042 dt=00:36:42 eta=2d 23:34:22 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-140.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   140 sample=17/178 sched=0.957172 loss=0.049663 dt=00:36:44 eta=2d 23:01:33 |-------------------------------------------------------------------------->
train_opt_callback: iter=   141 sample=25/178 sched=0.956568 loss=0.068019 dt=00:37:00 eta=2d 22:56:35 |-------------------------------------------------------------------------->
train_opt_callback: iter=   142 sample=33/178 sched=0.955960 loss=0.063782 dt=00:37:03 eta=2d 22:24:35 |-------------------------------------------------------------------------->
train_opt_callback: iter=   143 sample=41/178 sched=0.955348 loss=0.057332 dt=00:36:59 eta=2d 21:39:13 |-------------------------------------------------------------------------->
train_opt_callback: iter=   144 sample=49/178 sched=0.954732 loss=0.084136 dt=00:36:58 eta=2d 21:00:29 |------------------------------------------------------------------------->
train_opt_callback: iter=   145 sample=57/178 sched=0.954112 loss=0.056280 dt=00:36:56 eta=2d 20:19:54 |-------------------------------------------------------------------------->
train_opt_callback: iter=   146 sample=65/178 sched=0.953488 loss=0.072391 dt=00:36:46 eta=2d 19:25:17 |-------------------------------------------------------------------------->
train_opt_callback: iter=   147 sample=73/178 sched=0.952861 loss=0.091059 dt=00:36:51 eta=2d 18:57:15 |------------------------------------------------------------------------->
train_opt_callback: iter=   148 sample=81/178 sched=0.952229 loss=0.060017 dt=00:36:50 eta=2d 18:18:27 |-------------------------------------------------------------------------->
train_opt_callback: iter=   149 sample=89/178 sched=0.951593 loss=0.057328 dt=00:37:06 eta=2d 18:10:07 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-150.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   150 sample=97/178 sched=0.950953 loss=0.052103 dt=00:37:08 eta=2d 17:37:32 |-------------------------------------------------------------------------->
train_opt_callback: iter=   151 sample=105/178 sched=0.950309 loss=0.065442 dt=00:36:51 eta=2d 16:29:31 |-------------------------------------------------------------------------->
train_opt_callback: iter=   152 sample=113/178 sched=0.949661 loss=0.056746 dt=00:37:03 eta=2d 16:14:24 |-------------------------------------------------------------------------->
train_opt_callback: iter=   153 sample=121/178 sched=0.949010 loss=0.070758 dt=00:37:09 eta=2d 15:46:52 |-------------------------------------------------------------------------->
train_opt_callback: iter=   154 sample=129/178 sched=0.948354 loss=0.069517 dt=00:36:57 eta=2d 14:49:22 |-------------------------------------------------------------------------->
train_opt_callback: iter=   155 sample=137/178 sched=0.947695 loss=0.065151 dt=00:36:59 eta=2d 14:16:57 |-------------------------------------------------------------------------->
train_opt_callback: iter=   156 sample=145/178 sched=0.947031 loss=0.073535 dt=00:36:53 eta=2d 13:28:26 |-------------------------------------------------------------------------->
train_opt_callback: iter=   157 sample=153/178 sched=0.946364 loss=0.060402 dt=00:36:47 eta=2d 12:42:39 |-------------------------------------------------------------------------->
train_opt_callback: iter=   158 sample=161/178 sched=0.945692 loss=0.092005 dt=00:36:58 eta=2d 12:22:44 |------------------------------------------------------------------------->
train_opt_callback: iter=   159 sample=169/178 sched=0.945017 loss=0.065230 dt=00:36:52 eta=2d 11:36:24 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-160.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   160 sample=177/178 sched=0.944338 loss=0.081499 dt=00:36:32 eta=2d 10:27:42 |------------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 7
train_opt_callback: iter=   161 sample=1/178 sched=0.943655 loss=0.074532 dt=00:36:23 eta=2d 09:36:26 |-------------------------------------------------------------------------->
train_opt_callback: iter=   162 sample=9/178 sched=0.942968 loss=0.047449 dt=00:36:36 eta=2d 09:20:45 |-------------------------------------------------------------------------->
train_opt_callback: iter=   163 sample=17/178 sched=0.942277 loss=0.075340 dt=00:36:40 eta=2d 08:50:47 |-------------------------------------------------------------------------->
train_opt_callback: iter=   164 sample=25/178 sched=0.941583 loss=0.058686 dt=00:36:31 eta=2d 08:00:13 |-------------------------------------------------------------------------->
train_opt_callback: iter=   165 sample=33/178 sched=0.940884 loss=0.078167 dt=00:36:19 eta=2d 07:05:45 |-------------------------------------------------------------------------->
train_opt_callback: iter=   166 sample=41/178 sched=0.940182 loss=0.049373 dt=00:36:17 eta=2d 06:26:38 |-------------------------------------------------------------------------->
train_opt_callback: iter=   167 sample=49/178 sched=0.939476 loss=0.071963 dt=00:36:13 eta=2d 05:44:20 |-------------------------------------------------------------------------->
train_opt_callback: iter=   168 sample=57/178 sched=0.938765 loss=0.061201 dt=00:36:17 eta=2d 05:13:00 |-------------------------------------------------------------------------->
train_opt_callback: iter=   169 sample=65/178 sched=0.938052 loss=0.054520 dt=00:36:32 eta=2d 04:59:19 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-170.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   170 sample=73/178 sched=0.937334 loss=0.055787 dt=00:36:42 eta=2d 04:36:15 |-------------------------------------------------------------------------->
train_opt_callback: iter=   171 sample=81/178 sched=0.936612 loss=0.058552 dt=00:36:31 eta=2d 03:44:10 |-------------------------------------------------------------------------->
train_opt_callback: iter=   172 sample=89/178 sched=0.935887 loss=0.066907 dt=00:36:37 eta=2d 03:16:36 |-------------------------------------------------------------------------->
train_opt_callback: iter=   173 sample=97/178 sched=0.935158 loss=0.051261 dt=00:36:19 eta=2d 02:15:00 |-------------------------------------------------------------------------->
train_opt_callback: iter=   174 sample=105/178 sched=0.934425 loss=0.053868 dt=00:36:38 eta=2d 02:04:51 |-------------------------------------------------------------------------->
train_opt_callback: iter=   175 sample=113/178 sched=0.933688 loss=0.056048 dt=00:36:30 eta=2d 01:16:42 |-------------------------------------------------------------------------->
train_opt_callback: iter=   176 sample=121/178 sched=0.932948 loss=0.045508 dt=00:36:29 eta=2d 00:39:40 |-------------------------------------------------------------------------->
train_opt_callback: iter=   177 sample=129/178 sched=0.932203 loss=0.068252 dt=00:36:19 eta=1d 23:50:09 |-------------------------------------------------------------------------->
train_opt_callback: iter=   178 sample=137/178 sched=0.931455 loss=0.048026 dt=00:36:19 eta=1d 23:12:56 |-------------------------------------------------------------------------->
train_opt_callback: iter=   179 sample=145/178 sched=0.930703 loss=0.047452 dt=00:36:29 eta=1d 22:49:28 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-180.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   180 sample=153/178 sched=0.929948 loss=0.079426 dt=00:36:15 eta=1d 21:55:25 |------------------------------------------------------------------------->
train_opt_callback: iter=   181 sample=161/178 sched=0.929188 loss=0.050863 dt=00:36:14 eta=1d 21:18:14 |-------------------------------------------------------------------------->
train_opt_callback: iter=   182 sample=169/178 sched=0.928425 loss=0.052199 dt=00:36:35 eta=1d 21:07:59 |-------------------------------------------------------------------------->
train_opt_callback: iter=   183 sample=177/178 sched=0.927658 loss=0.058410 dt=00:36:29 eta=1d 20:24:11 |-------------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 8
train_opt_callback: iter=   184 sample=1/178 sched=0.926888 loss=0.045401 dt=00:36:22 eta=1d 19:39:23 |-------------------------------------------------------------------------->
train_opt_callback: iter=   185 sample=9/178 sched=0.926113 loss=0.052396 dt=00:36:31 eta=1d 19:13:49 |-------------------------------------------------------------------------->
train_opt_callback: iter=   186 sample=17/178 sched=0.925335 loss=0.067700 dt=00:36:31 eta=1d 18:37:00 |-------------------------------------------------------------------------->
train_opt_callback: iter=   187 sample=25/178 sched=0.924554 loss=0.046180 dt=00:36:16 eta=1d 17:42:56 |-------------------------------------------------------------------------->
train_opt_callback: iter=   188 sample=33/178 sched=0.923768 loss=0.050971 dt=00:36:36 eta=1d 17:29:09 |-------------------------------------------------------------------------->
train_opt_callback: iter=   189 sample=41/178 sched=0.922979 loss=0.049186 dt=00:36:34 eta=1d 16:50:17 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-190.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   190 sample=49/178 sched=0.922186 loss=0.062805 dt=00:36:44 eta=1d 16:24:30 |-------------------------------------------------------------------------->
train_opt_callback: iter=   191 sample=57/178 sched=0.921390 loss=0.047807 dt=00:36:43 eta=1d 15:46:58 |-------------------------------------------------------------------------->
train_opt_callback: iter=   192 sample=65/178 sched=0.920590 loss=0.049784 dt=00:37:41 eta=1d 16:12:06 |-------------------------------------------------------------------------->
train_opt_callback: iter=   193 sample=73/178 sched=0.919786 loss=0.055659 dt=00:37:01 eta=1d 14:52:57 |-------------------------------------------------------------------------->
train_opt_callback: iter=   194 sample=81/178 sched=0.918978 loss=0.051812 dt=00:36:56 eta=1d 14:10:26 |-------------------------------------------------------------------------->
train_opt_callback: iter=   195 sample=89/178 sched=0.918167 loss=0.051834 dt=00:37:03 eta=1d 13:40:34 |-------------------------------------------------------------------------->
train_opt_callback: iter=   196 sample=97/178 sched=0.917353 loss=0.063875 dt=00:37:02 eta=1d 13:02:30 |-------------------------------------------------------------------------->
train_opt_callback: iter=   197 sample=105/178 sched=0.916534 loss=0.044799 dt=00:36:46 eta=1d 12:09:21 |-------------------------------------------------------------------------->
train_opt_callback: iter=   198 sample=113/178 sched=0.915712 loss=0.052604 dt=00:36:42 eta=1d 11:29:15 |-------------------------------------------------------------------------->
train_opt_callback: iter=   199 sample=121/178 sched=0.914887 loss=0.052343 dt=00:36:41 eta=1d 10:51:43 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-200.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   200 sample=129/178 sched=0.914058 loss=0.053900 dt=00:36:28 eta=1d 10:02:57 |-------------------------------------------------------------------------->
train_opt_callback: iter=   201 sample=137/178 sched=0.913225 loss=0.047804 dt=00:36:32 eta=1d 09:29:47 |-------------------------------------------------------------------------->
train_opt_callback: iter=   202 sample=145/178 sched=0.912389 loss=0.045218 dt=00:37:16 eta=1d 09:33:15 |-------------------------------------------------------------------------->
train_opt_callback: iter=   203 sample=153/178 sched=0.911549 loss=0.056633 dt=00:37:15 eta=1d 08:54:29 |-------------------------------------------------------------------------->
train_opt_callback: iter=   204 sample=161/178 sched=0.910705 loss=0.087018 dt=00:37:17 eta=1d 08:18:48 |------------------------------------------------------------------------->
train_opt_callback: iter=   205 sample=169/178 sched=0.909858 loss=0.051251 dt=00:36:57 eta=1d 07:24:27 |-------------------------------------------------------------------------->
train_opt_callback: iter=   206 sample=177/178 sched=0.909007 loss=0.062427 dt=00:36:48 eta=1d 06:40:45 |-------------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 9
train_opt_callback: iter=   207 sample=1/178 sched=0.908153 loss=0.054127 dt=00:36:54 eta=1d 06:08:18 |-------------------------------------------------------------------------->
train_opt_callback: iter=   208 sample=9/178 sched=0.907296 loss=0.051355 dt=00:36:40 eta=1d 05:20:41 |-------------------------------------------------------------------------->
train_opt_callback: iter=   209 sample=17/178 sched=0.906434 loss=0.045598 dt=00:36:52 eta=1d 04:53:05 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-210.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   210 sample=25/178 sched=0.905570 loss=0.045980 dt=00:36:39 eta=1d 04:06:27 |-------------------------------------------------------------------------->
train_opt_callback: iter=   211 sample=33/178 sched=0.904702 loss=0.050601 dt=00:36:36 eta=1d 03:27:37 |-------------------------------------------------------------------------->
train_opt_callback: iter=   212 sample=41/178 sched=0.903830 loss=0.050250 dt=00:36:59 eta=1d 03:07:28 |-------------------------------------------------------------------------->
train_opt_callback: iter=   213 sample=49/178 sched=0.902955 loss=0.043415 dt=00:36:52 eta=1d 02:25:32 |-------------------------------------------------------------------------->
train_opt_callback: iter=   214 sample=57/178 sched=0.902076 loss=0.059098 dt=00:37:10 eta=1d 02:01:34 |-------------------------------------------------------------------------->
train_opt_callback: iter=   215 sample=65/178 sched=0.901194 loss=0.047417 dt=00:36:53 eta=1d 01:12:38 |-------------------------------------------------------------------------->
train_opt_callback: iter=   216 sample=73/178 sched=0.900308 loss=0.047726 dt=00:36:57 eta=1d 00:38:27 |-------------------------------------------------------------------------->
train_opt_callback: iter=   217 sample=81/178 sched=0.899419 loss=0.064566 dt=00:36:56 eta=1d 00:00:34 |-------------------------------------------------------------------------->
train_opt_callback: iter=   218 sample=89/178 sched=0.898526 loss=0.046686 dt=00:36:22 eta=23:02:12 |-------------------------------------------------------------------------->
train_opt_callback: iter=   219 sample=97/178 sched=0.897630 loss=0.052115 dt=00:36:48 eta=22:42:04 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-220.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   220 sample=105/178 sched=0.896731 loss=0.052652 dt=00:36:57 eta=22:10:33 |-------------------------------------------------------------------------->
train_opt_callback: iter=   221 sample=113/178 sched=0.895828 loss=0.048956 dt=00:36:57 eta=21:33:23 |-------------------------------------------------------------------------->
train_opt_callback: iter=   222 sample=121/178 sched=0.894922 loss=0.047552 dt=00:36:44 eta=20:49:26 |-------------------------------------------------------------------------->
train_opt_callback: iter=   223 sample=129/178 sched=0.894012 loss=0.043168 dt=00:36:43 eta=20:12:03 |-------------------------------------------------------------------------->
train_opt_callback: iter=   224 sample=137/178 sched=0.893099 loss=0.050130 dt=00:36:45 eta=19:36:25 |-------------------------------------------------------------------------->
train_opt_callback: iter=   225 sample=145/178 sched=0.892183 loss=0.044088 dt=00:36:37 eta=18:55:10 |-------------------------------------------------------------------------->
train_opt_callback: iter=   226 sample=153/178 sched=0.891263 loss=0.058652 dt=00:36:50 eta=18:25:05 |-------------------------------------------------------------------------->
train_opt_callback: iter=   227 sample=161/178 sched=0.890340 loss=0.048266 dt=00:37:02 eta=17:54:22 |-------------------------------------------------------------------------->
train_opt_callback: iter=   228 sample=169/178 sched=0.889413 loss=0.049002 dt=00:36:38 eta=17:05:46 |-------------------------------------------------------------------------->
train_opt_callback: iter=   229 sample=177/178 sched=0.888483 loss=0.071915 dt=00:36:51 eta=16:35:14 |-------------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 10
save_checkpoint_lora_file: saving to checkpoint-230.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   230 sample=1/178 sched=0.887550 loss=0.050140 dt=00:36:56 eta=16:00:20 |-------------------------------------------------------------------------->
train_opt_callback: iter=   231 sample=9/178 sched=0.886613 loss=0.041014 dt=00:36:44 eta=15:18:38 |-------------------------------------------------------------------------->
train_opt_callback: iter=   232 sample=17/178 sched=0.885674 loss=0.056183 dt=00:36:38 eta=14:39:14 |-------------------------------------------------------------------------->
train_opt_callback: iter=   233 sample=25/178 sched=0.884730 loss=0.047450 dt=00:36:33 eta=14:00:56 |-------------------------------------------------------------------------->
train_opt_callback: iter=   234 sample=33/178 sched=0.883784 loss=0.041083 dt=00:36:26 eta=13:21:44 |-------------------------------------------------------------------------->
train_opt_callback: iter=   235 sample=41/178 sched=0.882834 loss=0.051407 dt=00:36:35 eta=12:48:19 |-------------------------------------------------------------------------->
train_opt_callback: iter=   236 sample=49/178 sched=0.881881 loss=0.046471 dt=00:36:29 eta=12:09:58 |-------------------------------------------------------------------------->
train_opt_callback: iter=   237 sample=57/178 sched=0.880924 loss=0.051171 dt=00:36:35 eta=11:35:10 |-------------------------------------------------------------------------->
train_opt_callback: iter=   238 sample=65/178 sched=0.879965 loss=0.039778 dt=00:36:19 eta=10:53:58 |-------------------------------------------------------------------------->
train_opt_callback: iter=   239 sample=73/178 sched=0.879002 loss=0.048262 dt=00:36:51 eta=10:26:33 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-240.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   240 sample=81/178 sched=0.878036 loss=0.050442 dt=00:37:14 eta=09:55:45 |-------------------------------------------------------------------------->
train_opt_callback: iter=   241 sample=89/178 sched=0.877066 loss=0.051114 dt=00:37:23 eta=09:20:55 |-------------------------------------------------------------------------->
train_opt_callback: iter=   242 sample=97/178 sched=0.876094 loss=0.046777 dt=00:36:59 eta=08:37:59 |-------------------------------------------------------------------------->
train_opt_callback: iter=   243 sample=105/178 sched=0.875118 loss=0.063385 dt=00:37:02 eta=08:01:26 |-------------------------------------------------------------------------->
train_opt_callback: iter=   244 sample=113/178 sched=0.874139 loss=0.047352 dt=00:37:05 eta=07:25:05 |-------------------------------------------------------------------------->
train_opt_callback: iter=   245 sample=121/178 sched=0.873157 loss=0.053672 dt=00:37:24 eta=06:51:27 |-------------------------------------------------------------------------->
train_opt_callback: iter=   246 sample=129/178 sched=0.872171 loss=0.054228 dt=00:37:03 eta=06:10:35 |-------------------------------------------------------------------------->
train_opt_callback: iter=   247 sample=137/178 sched=0.871183 loss=0.044063 dt=00:36:52 eta=05:31:50 |-------------------------------------------------------------------------->
train_opt_callback: iter=   248 sample=145/178 sched=0.870191 loss=0.047287 dt=00:36:50 eta=04:54:40 |-------------------------------------------------------------------------->
train_opt_callback: iter=   249 sample=153/178 sched=0.869196 loss=0.049908 dt=00:36:47 eta=04:17:31 |-------------------------------------------------------------------------->
save_checkpoint_lora_file: saving to checkpoint-250.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin
train_opt_callback: iter=   250 sample=161/178 sched=0.868198 loss=0.057780 dt=00:36:36 eta=03:39:38 |-------------------------------------------------------------------------->
train_opt_callback: iter=   251 sample=169/178 sched=0.867197 loss=0.055477 dt=00:36:23 eta=03:01:55 |-------------------------------------------------------------------------->
train_opt_callback: iter=   252 sample=177/178 sched=0.866192 loss=0.047620 dt=00:36:53 eta=02:27:35 |-------------------------------------------------------------------------->
train_opt_callback: reshuffle samples. completed epochs: 11
train_opt_callback: iter=   253 sample=1/178 sched=0.865185 loss=0.044904 dt=00:36:49 eta=01:50:29 |-------------------------------------------------------------------------->
train_opt_callback: iter=   254 sample=9/178 sched=0.864174 loss=0.048317 dt=00:36:59 eta=01:13:59 |-------------------------------------------------------------------------->
train_opt_callback: iter=   255 sample=17/178 sched=0.863161 loss=0.046755 dt=00:36:48 eta=00:36:48 |-------------------------------------------------------------------------->
train_opt_callback: iter=   256 sample=25/178 sched=0.862144 loss=0.045131 dt=00:36:48 eta=0.0ms |-------------------------------------------------------------------------->
main: total training time: 5d 12:53:46
save_checkpoint_lora_file: saving to checkpoint-256.gguf
save_checkpoint_lora_file: saving to checkpoint-LATEST.gguf
save_as_llama_lora: saving to lora.bin
save_as_llama_lora: saving to lora.bin