Kendamarron
/

Width-Up-Scaled-llm-jp-3-2.3b-steps1500

Model card Files Files and versions Community

Kendamarron commited on Dec 7, 2024

Commit

a909317

·

verified ·

1 Parent(s): 5194685

Update README.md

Files changed (1) hide show

README.md +50 -3

README.md CHANGED Viewed

@@ -1,3 +1,50 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+```
+### model
+model_name_or_path: Kendamarron/Width-Up-Scaled-llm-jp-3-2.3b
+### method
+stage: pt
+do_train: true
+finetuning_type: full
+enable_liger_kernel: true
+flash_attn: fa2
+### dataset
+dataset: abeja_test
+cutoff_len: 4096
+packing: true
+overwrite_cache: true
+preprocessing_num_workers: 64
+### output
+output_dir: saves/llm-jp/full/cpt/
+logging_steps: 1
+save_steps: 500
+plot_loss: true
+overwrite_output_dir: true
+### train
+per_device_train_batch_size: 16
+gradient_accumulation_steps: 4
+learning_rate: 1.0e-4
+num_train_epochs: 1.0
+lr_scheduler_type: constant_with_warmup
+adam_beta2: 0.9
+adam_beta2: 0.95
+optim: adamw_bnb_8bit
+warmup_steps: 500
+bf16: true
+ddp_timeout: 180000000
+### eval
+val_size: 1000
+per_device_eval_batch_size: 2
+eval_strategy: steps
+eval_steps: 500
+### logging
+report_to: wandb
+```