Femboyuwu2000 commited on
Commit
4a2a749
1 Parent(s): 8acc802

Training in progress, step 40

Browse files
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7cee5316db6e08bda593292c7431684b3dc73870ce8a54a0c3a013496aab2e9e
3
  size 4725640
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c05b93acda6a7b2f81f0afd85f5badfd9cfa3a6e9e1606092480a0fbff648c88
3
  size 4725640
runs/Apr13_04-53-24_c5a47843c998/events.out.tfevents.1712984741.c5a47843c998.109.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d96cf0882700d1dbb5c8221808737e688d46416d72bbe5177a5bb46c68f44e9b
3
- size 5487
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fc8f09b4ae665bdd4a2a08f56faec5206d9c03028fc53aa009026fd8ff8d950
3
+ size 5694
wandb/debug-internal.log CHANGED
@@ -74,3 +74,15 @@ subprocess.TimeoutExpired: Command '['conda', 'env', 'export']' timed out after
74
  2024-04-13 05:07:13,338 DEBUG SenderThread:162 [sender.py:send_request():406] send_request: summary_record
75
  2024-04-13 05:07:13,340 INFO SenderThread:162 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
76
  2024-04-13 05:07:13,509 INFO Thread-12 :162 [dir_watcher.py:_on_file_created():271] file/dir created: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/wandb-summary.json
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  2024-04-13 05:07:13,338 DEBUG SenderThread:162 [sender.py:send_request():406] send_request: summary_record
75
  2024-04-13 05:07:13,340 INFO SenderThread:162 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
76
  2024-04-13 05:07:13,509 INFO Thread-12 :162 [dir_watcher.py:_on_file_created():271] file/dir created: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/wandb-summary.json
77
+ 2024-04-13 05:07:15,995 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: status_report
78
+ 2024-04-13 05:07:16,510 INFO Thread-12 :162 [dir_watcher.py:_on_file_modified():288] file/dir modified: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/output.log
79
+ 2024-04-13 05:07:21,001 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: status_report
80
+ 2024-04-13 05:07:21,512 INFO Thread-12 :162 [dir_watcher.py:_on_file_modified():288] file/dir modified: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/config.yaml
81
+ 2024-04-13 05:07:21,646 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: stop_status
82
+ 2024-04-13 05:07:21,646 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: internal_messages
83
+ 2024-04-13 05:07:21,647 DEBUG SenderThread:162 [sender.py:send_request():406] send_request: stop_status
84
+ 2024-04-13 05:07:22,111 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: partial_history
85
+ 2024-04-13 05:07:22,112 DEBUG SenderThread:162 [sender.py:send():379] send: history
86
+ 2024-04-13 05:07:22,113 DEBUG SenderThread:162 [sender.py:send_request():406] send_request: summary_record
87
+ 2024-04-13 05:07:22,115 INFO SenderThread:162 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
88
+ 2024-04-13 05:07:22,512 INFO Thread-12 :162 [dir_watcher.py:_on_file_modified():288] file/dir modified: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/wandb-summary.json
wandb/run-20240413_050649-ne3279ey/files/config.yaml CHANGED
@@ -26,7 +26,23 @@ _wandb:
26
  - 84
27
  - 98
28
  - 105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  3:
 
30
  - 23
31
  4: 3.10.13
32
  5: 0.16.5
@@ -35,4 +51,651 @@ _wandb:
35
  - 1
36
  - 2
37
  - 5
 
 
38
  13: linux-x86_64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  - 84
27
  - 98
28
  - 105
29
+ 2:
30
+ - 1
31
+ - 2
32
+ - 3
33
+ - 5
34
+ - 11
35
+ - 12
36
+ - 49
37
+ - 51
38
+ - 53
39
+ - 55
40
+ - 71
41
+ - 84
42
+ - 98
43
+ - 105
44
  3:
45
+ - 7
46
  - 23
47
  4: 3.10.13
48
  5: 0.16.5
 
51
  - 1
52
  - 2
53
  - 5
54
+ 9:
55
+ 1: transformers_trainer
56
  13: linux-x86_64
57
+ m:
58
+ - 1: train/global_step
59
+ 6:
60
+ - 3
61
+ - 1: train/loss
62
+ 5: 1
63
+ 6:
64
+ - 1
65
+ - 1: train/grad_norm
66
+ 5: 1
67
+ 6:
68
+ - 1
69
+ - 1: train/learning_rate
70
+ 5: 1
71
+ 6:
72
+ - 1
73
+ - 1: train/epoch
74
+ 5: 1
75
+ 6:
76
+ - 1
77
+ vocab_size:
78
+ desc: null
79
+ value: 250880
80
+ hidden_size:
81
+ desc: null
82
+ value: 1536
83
+ n_layer:
84
+ desc: null
85
+ value: 24
86
+ n_head:
87
+ desc: null
88
+ value: 16
89
+ layer_norm_epsilon:
90
+ desc: null
91
+ value: 1.0e-05
92
+ initializer_range:
93
+ desc: null
94
+ value: 0.02
95
+ use_cache:
96
+ desc: null
97
+ value: false
98
+ pretraining_tp:
99
+ desc: null
100
+ value: 1
101
+ apply_residual_connection_post_layernorm:
102
+ desc: null
103
+ value: false
104
+ hidden_dropout:
105
+ desc: null
106
+ value: 0.0
107
+ attention_dropout:
108
+ desc: null
109
+ value: 0.0
110
+ bos_token_id:
111
+ desc: null
112
+ value: 1
113
+ eos_token_id:
114
+ desc: null
115
+ value: 2
116
+ slow_but_exact:
117
+ desc: null
118
+ value: false
119
+ return_dict:
120
+ desc: null
121
+ value: true
122
+ output_hidden_states:
123
+ desc: null
124
+ value: false
125
+ output_attentions:
126
+ desc: null
127
+ value: false
128
+ torchscript:
129
+ desc: null
130
+ value: false
131
+ torch_dtype:
132
+ desc: null
133
+ value: float16
134
+ use_bfloat16:
135
+ desc: null
136
+ value: false
137
+ tf_legacy_loss:
138
+ desc: null
139
+ value: false
140
+ pruned_heads:
141
+ desc: null
142
+ value: {}
143
+ tie_word_embeddings:
144
+ desc: null
145
+ value: true
146
+ chunk_size_feed_forward:
147
+ desc: null
148
+ value: 0
149
+ is_encoder_decoder:
150
+ desc: null
151
+ value: false
152
+ is_decoder:
153
+ desc: null
154
+ value: false
155
+ cross_attention_hidden_size:
156
+ desc: null
157
+ value: null
158
+ add_cross_attention:
159
+ desc: null
160
+ value: false
161
+ tie_encoder_decoder:
162
+ desc: null
163
+ value: false
164
+ max_length:
165
+ desc: null
166
+ value: 20
167
+ min_length:
168
+ desc: null
169
+ value: 0
170
+ do_sample:
171
+ desc: null
172
+ value: false
173
+ early_stopping:
174
+ desc: null
175
+ value: false
176
+ num_beams:
177
+ desc: null
178
+ value: 1
179
+ num_beam_groups:
180
+ desc: null
181
+ value: 1
182
+ diversity_penalty:
183
+ desc: null
184
+ value: 0.0
185
+ temperature:
186
+ desc: null
187
+ value: 1.0
188
+ top_k:
189
+ desc: null
190
+ value: 50
191
+ top_p:
192
+ desc: null
193
+ value: 1.0
194
+ typical_p:
195
+ desc: null
196
+ value: 1.0
197
+ repetition_penalty:
198
+ desc: null
199
+ value: 1.0
200
+ length_penalty:
201
+ desc: null
202
+ value: 1.0
203
+ no_repeat_ngram_size:
204
+ desc: null
205
+ value: 0
206
+ encoder_no_repeat_ngram_size:
207
+ desc: null
208
+ value: 0
209
+ bad_words_ids:
210
+ desc: null
211
+ value: null
212
+ num_return_sequences:
213
+ desc: null
214
+ value: 1
215
+ output_scores:
216
+ desc: null
217
+ value: false
218
+ return_dict_in_generate:
219
+ desc: null
220
+ value: false
221
+ forced_bos_token_id:
222
+ desc: null
223
+ value: null
224
+ forced_eos_token_id:
225
+ desc: null
226
+ value: null
227
+ remove_invalid_values:
228
+ desc: null
229
+ value: false
230
+ exponential_decay_length_penalty:
231
+ desc: null
232
+ value: null
233
+ suppress_tokens:
234
+ desc: null
235
+ value: null
236
+ begin_suppress_tokens:
237
+ desc: null
238
+ value: null
239
+ architectures:
240
+ desc: null
241
+ value:
242
+ - BloomForCausalLM
243
+ finetuning_task:
244
+ desc: null
245
+ value: null
246
+ id2label:
247
+ desc: null
248
+ value:
249
+ '0': LABEL_0
250
+ '1': LABEL_1
251
+ label2id:
252
+ desc: null
253
+ value:
254
+ LABEL_0: 0
255
+ LABEL_1: 1
256
+ tokenizer_class:
257
+ desc: null
258
+ value: null
259
+ prefix:
260
+ desc: null
261
+ value: null
262
+ pad_token_id:
263
+ desc: null
264
+ value: 3
265
+ sep_token_id:
266
+ desc: null
267
+ value: null
268
+ decoder_start_token_id:
269
+ desc: null
270
+ value: null
271
+ task_specific_params:
272
+ desc: null
273
+ value: null
274
+ problem_type:
275
+ desc: null
276
+ value: null
277
+ _name_or_path:
278
+ desc: null
279
+ value: bigscience/bloomz-1b1
280
+ transformers_version:
281
+ desc: null
282
+ value: 4.39.3
283
+ attention_softmax_in_fp32:
284
+ desc: null
285
+ value: true
286
+ bias_dropout_fusion:
287
+ desc: null
288
+ value: true
289
+ unk_token_id:
290
+ desc: null
291
+ value: 0
292
+ masked_softmax_fusion:
293
+ desc: null
294
+ value: true
295
+ model_type:
296
+ desc: null
297
+ value: bloom
298
+ n_inner:
299
+ desc: null
300
+ value: null
301
+ offset_alibi:
302
+ desc: null
303
+ value: 100
304
+ seq_length:
305
+ desc: null
306
+ value: 2048
307
+ skip_bias_add:
308
+ desc: null
309
+ value: true
310
+ skip_bias_add_qkv:
311
+ desc: null
312
+ value: false
313
+ quantization_config:
314
+ desc: null
315
+ value:
316
+ quant_method: QuantizationMethod.BITS_AND_BYTES
317
+ _load_in_8bit: false
318
+ _load_in_4bit: true
319
+ llm_int8_threshold: 6.0
320
+ llm_int8_skip_modules: null
321
+ llm_int8_enable_fp32_cpu_offload: false
322
+ llm_int8_has_fp16_weight: false
323
+ bnb_4bit_quant_type: nf4
324
+ bnb_4bit_use_double_quant: true
325
+ bnb_4bit_compute_dtype: float16
326
+ bnb_4bit_quant_storage: uint8
327
+ load_in_4bit: true
328
+ load_in_8bit: false
329
+ output_dir:
330
+ desc: null
331
+ value: /kaggle/working/
332
+ overwrite_output_dir:
333
+ desc: null
334
+ value: false
335
+ do_train:
336
+ desc: null
337
+ value: false
338
+ do_eval:
339
+ desc: null
340
+ value: false
341
+ do_predict:
342
+ desc: null
343
+ value: false
344
+ evaluation_strategy:
345
+ desc: null
346
+ value: 'no'
347
+ prediction_loss_only:
348
+ desc: null
349
+ value: false
350
+ per_device_train_batch_size:
351
+ desc: null
352
+ value: 1
353
+ per_device_eval_batch_size:
354
+ desc: null
355
+ value: 8
356
+ per_gpu_train_batch_size:
357
+ desc: null
358
+ value: null
359
+ per_gpu_eval_batch_size:
360
+ desc: null
361
+ value: null
362
+ gradient_accumulation_steps:
363
+ desc: null
364
+ value: 1
365
+ eval_accumulation_steps:
366
+ desc: null
367
+ value: null
368
+ eval_delay:
369
+ desc: null
370
+ value: 0
371
+ learning_rate:
372
+ desc: null
373
+ value: 5.0e-05
374
+ weight_decay:
375
+ desc: null
376
+ value: 0.0001
377
+ adam_beta1:
378
+ desc: null
379
+ value: 0.9
380
+ adam_beta2:
381
+ desc: null
382
+ value: 0.999
383
+ adam_epsilon:
384
+ desc: null
385
+ value: 1.0e-08
386
+ max_grad_norm:
387
+ desc: null
388
+ value: 0.3
389
+ num_train_epochs:
390
+ desc: null
391
+ value: 5
392
+ max_steps:
393
+ desc: null
394
+ value: 20000
395
+ lr_scheduler_type:
396
+ desc: null
397
+ value: cosine
398
+ lr_scheduler_kwargs:
399
+ desc: null
400
+ value: {}
401
+ warmup_ratio:
402
+ desc: null
403
+ value: 0.03
404
+ warmup_steps:
405
+ desc: null
406
+ value: 0
407
+ log_level:
408
+ desc: null
409
+ value: passive
410
+ log_level_replica:
411
+ desc: null
412
+ value: warning
413
+ log_on_each_node:
414
+ desc: null
415
+ value: true
416
+ logging_dir:
417
+ desc: null
418
+ value: /kaggle/working/runs/Apr13_04-53-24_c5a47843c998
419
+ logging_strategy:
420
+ desc: null
421
+ value: steps
422
+ logging_first_step:
423
+ desc: null
424
+ value: false
425
+ logging_steps:
426
+ desc: null
427
+ value: 20
428
+ logging_nan_inf_filter:
429
+ desc: null
430
+ value: true
431
+ save_strategy:
432
+ desc: null
433
+ value: steps
434
+ save_steps:
435
+ desc: null
436
+ value: 20
437
+ save_total_limit:
438
+ desc: null
439
+ value: 1
440
+ save_safetensors:
441
+ desc: null
442
+ value: true
443
+ save_on_each_node:
444
+ desc: null
445
+ value: false
446
+ save_only_model:
447
+ desc: null
448
+ value: false
449
+ no_cuda:
450
+ desc: null
451
+ value: false
452
+ use_cpu:
453
+ desc: null
454
+ value: false
455
+ use_mps_device:
456
+ desc: null
457
+ value: false
458
+ seed:
459
+ desc: null
460
+ value: 42
461
+ data_seed:
462
+ desc: null
463
+ value: null
464
+ jit_mode_eval:
465
+ desc: null
466
+ value: false
467
+ use_ipex:
468
+ desc: null
469
+ value: false
470
+ bf16:
471
+ desc: null
472
+ value: false
473
+ fp16:
474
+ desc: null
475
+ value: true
476
+ fp16_opt_level:
477
+ desc: null
478
+ value: O1
479
+ half_precision_backend:
480
+ desc: null
481
+ value: auto
482
+ bf16_full_eval:
483
+ desc: null
484
+ value: false
485
+ fp16_full_eval:
486
+ desc: null
487
+ value: false
488
+ tf32:
489
+ desc: null
490
+ value: null
491
+ local_rank:
492
+ desc: null
493
+ value: 0
494
+ ddp_backend:
495
+ desc: null
496
+ value: null
497
+ tpu_num_cores:
498
+ desc: null
499
+ value: null
500
+ tpu_metrics_debug:
501
+ desc: null
502
+ value: false
503
+ debug:
504
+ desc: null
505
+ value: []
506
+ dataloader_drop_last:
507
+ desc: null
508
+ value: false
509
+ eval_steps:
510
+ desc: null
511
+ value: null
512
+ dataloader_num_workers:
513
+ desc: null
514
+ value: 2
515
+ dataloader_prefetch_factor:
516
+ desc: null
517
+ value: null
518
+ past_index:
519
+ desc: null
520
+ value: -1
521
+ run_name:
522
+ desc: null
523
+ value: /kaggle/working/
524
+ disable_tqdm:
525
+ desc: null
526
+ value: false
527
+ remove_unused_columns:
528
+ desc: null
529
+ value: true
530
+ label_names:
531
+ desc: null
532
+ value: null
533
+ load_best_model_at_end:
534
+ desc: null
535
+ value: false
536
+ metric_for_best_model:
537
+ desc: null
538
+ value: null
539
+ greater_is_better:
540
+ desc: null
541
+ value: null
542
+ ignore_data_skip:
543
+ desc: null
544
+ value: false
545
+ fsdp:
546
+ desc: null
547
+ value: []
548
+ fsdp_min_num_params:
549
+ desc: null
550
+ value: 0
551
+ fsdp_config:
552
+ desc: null
553
+ value:
554
+ min_num_params: 0
555
+ xla: false
556
+ xla_fsdp_v2: false
557
+ xla_fsdp_grad_ckpt: false
558
+ fsdp_transformer_layer_cls_to_wrap:
559
+ desc: null
560
+ value: null
561
+ accelerator_config:
562
+ desc: null
563
+ value:
564
+ split_batches: false
565
+ dispatch_batches: null
566
+ even_batches: true
567
+ use_seedable_sampler: true
568
+ deepspeed:
569
+ desc: null
570
+ value: null
571
+ label_smoothing_factor:
572
+ desc: null
573
+ value: 0.0
574
+ optim:
575
+ desc: null
576
+ value: paged_adamw_8bit
577
+ optim_args:
578
+ desc: null
579
+ value: null
580
+ adafactor:
581
+ desc: null
582
+ value: false
583
+ group_by_length:
584
+ desc: null
585
+ value: false
586
+ length_column_name:
587
+ desc: null
588
+ value: length
589
+ report_to:
590
+ desc: null
591
+ value:
592
+ - tensorboard
593
+ - wandb
594
+ ddp_find_unused_parameters:
595
+ desc: null
596
+ value: null
597
+ ddp_bucket_cap_mb:
598
+ desc: null
599
+ value: null
600
+ ddp_broadcast_buffers:
601
+ desc: null
602
+ value: null
603
+ dataloader_pin_memory:
604
+ desc: null
605
+ value: true
606
+ dataloader_persistent_workers:
607
+ desc: null
608
+ value: false
609
+ skip_memory_metrics:
610
+ desc: null
611
+ value: true
612
+ use_legacy_prediction_loop:
613
+ desc: null
614
+ value: false
615
+ push_to_hub:
616
+ desc: null
617
+ value: true
618
+ resume_from_checkpoint:
619
+ desc: null
620
+ value: null
621
+ hub_model_id:
622
+ desc: null
623
+ value: Femboyuwu2000/bloomz-1b1-vn-chat
624
+ hub_strategy:
625
+ desc: null
626
+ value: checkpoint
627
+ hub_token:
628
+ desc: null
629
+ value: <HUB_TOKEN>
630
+ hub_private_repo:
631
+ desc: null
632
+ value: false
633
+ hub_always_push:
634
+ desc: null
635
+ value: false
636
+ gradient_checkpointing:
637
+ desc: null
638
+ value: true
639
+ gradient_checkpointing_kwargs:
640
+ desc: null
641
+ value: null
642
+ include_inputs_for_metrics:
643
+ desc: null
644
+ value: false
645
+ fp16_backend:
646
+ desc: null
647
+ value: auto
648
+ push_to_hub_model_id:
649
+ desc: null
650
+ value: null
651
+ push_to_hub_organization:
652
+ desc: null
653
+ value: null
654
+ push_to_hub_token:
655
+ desc: null
656
+ value: <PUSH_TO_HUB_TOKEN>
657
+ mp_parameters:
658
+ desc: null
659
+ value: ''
660
+ auto_find_batch_size:
661
+ desc: null
662
+ value: false
663
+ full_determinism:
664
+ desc: null
665
+ value: false
666
+ torchdynamo:
667
+ desc: null
668
+ value: null
669
+ ray_scope:
670
+ desc: null
671
+ value: last
672
+ ddp_timeout:
673
+ desc: null
674
+ value: 1800
675
+ torch_compile:
676
+ desc: null
677
+ value: false
678
+ torch_compile_backend:
679
+ desc: null
680
+ value: null
681
+ torch_compile_mode:
682
+ desc: null
683
+ value: null
684
+ dispatch_batches:
685
+ desc: null
686
+ value: null
687
+ split_batches:
688
+ desc: null
689
+ value: null
690
+ include_tokens_per_second:
691
+ desc: null
692
+ value: false
693
+ include_num_input_tokens_seen:
694
+ desc: null
695
+ value: false
696
+ neftune_noise_alpha:
697
+ desc: null
698
+ value: null
699
+ optim_target_modules:
700
+ desc: null
701
+ value: null
wandb/run-20240413_050649-ne3279ey/files/output.log CHANGED
@@ -1 +1,5 @@
 
 
 
 
1
  /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
 
1
+ /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
2
+ warnings.warn(
3
+ /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
4
+ warnings.warn(
5
  /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
wandb/run-20240413_050649-ne3279ey/files/wandb-summary.json CHANGED
@@ -1 +1 @@
1
- {"train/loss": 3.7393, "train/grad_norm": 4.044270038604736, "train/learning_rate": 1.6666666666666667e-06, "train/epoch": 0.0, "train/global_step": 20, "_timestamp": 1712984833.3348606, "_runtime": 23.452062606811523, "_step": 0}
 
1
+ {"train/loss": 3.6056, "train/grad_norm": 1.908144235610962, "train/learning_rate": 3.2500000000000002e-06, "train/epoch": 0.0, "train/global_step": 40, "_timestamp": 1712984842.1107213, "_runtime": 32.22792339324951, "_step": 1}
wandb/run-20240413_050649-ne3279ey/logs/debug-internal.log CHANGED
@@ -74,3 +74,15 @@ subprocess.TimeoutExpired: Command '['conda', 'env', 'export']' timed out after
74
  2024-04-13 05:07:13,338 DEBUG SenderThread:162 [sender.py:send_request():406] send_request: summary_record
75
  2024-04-13 05:07:13,340 INFO SenderThread:162 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
76
  2024-04-13 05:07:13,509 INFO Thread-12 :162 [dir_watcher.py:_on_file_created():271] file/dir created: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/wandb-summary.json
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  2024-04-13 05:07:13,338 DEBUG SenderThread:162 [sender.py:send_request():406] send_request: summary_record
75
  2024-04-13 05:07:13,340 INFO SenderThread:162 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
76
  2024-04-13 05:07:13,509 INFO Thread-12 :162 [dir_watcher.py:_on_file_created():271] file/dir created: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/wandb-summary.json
77
+ 2024-04-13 05:07:15,995 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: status_report
78
+ 2024-04-13 05:07:16,510 INFO Thread-12 :162 [dir_watcher.py:_on_file_modified():288] file/dir modified: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/output.log
79
+ 2024-04-13 05:07:21,001 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: status_report
80
+ 2024-04-13 05:07:21,512 INFO Thread-12 :162 [dir_watcher.py:_on_file_modified():288] file/dir modified: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/config.yaml
81
+ 2024-04-13 05:07:21,646 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: stop_status
82
+ 2024-04-13 05:07:21,646 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: internal_messages
83
+ 2024-04-13 05:07:21,647 DEBUG SenderThread:162 [sender.py:send_request():406] send_request: stop_status
84
+ 2024-04-13 05:07:22,111 DEBUG HandlerThread:162 [handler.py:handle_request():146] handle_request: partial_history
85
+ 2024-04-13 05:07:22,112 DEBUG SenderThread:162 [sender.py:send():379] send: history
86
+ 2024-04-13 05:07:22,113 DEBUG SenderThread:162 [sender.py:send_request():406] send_request: summary_record
87
+ 2024-04-13 05:07:22,115 INFO SenderThread:162 [sender.py:_save_file():1390] saving file wandb-summary.json with policy end
88
+ 2024-04-13 05:07:22,512 INFO Thread-12 :162 [dir_watcher.py:_on_file_modified():288] file/dir modified: /kaggle/working/wandb/run-20240413_050649-ne3279ey/files/wandb-summary.json