--- license: apache-2.0 library_name: peft tags: - axolotl - generated_from_trainer base_model: nlpai-lab/KULLM3 model-index: - name: kullm3_finetuning_test_4300QA_10epochs results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml base_model: nlpai-lab/KULLM3 base_model_config: nlpai-lab/KULLM3 model_type: LlamaForCausalLM tokenizer_type: LlamaTokenizer is_llama_derived_model: true hub_model_id: kullm3_finetuning_test_4300QA_10epochs load_in_8bit: false load_in_4bit: true strict: false datasets: - path: superiort/multiplechoice-4300 type: alpaca dataset_prepared_path: last_run_prepared val_set_size: 0.02 output_dir: ./kullm3_finetuning_test_4300QA_10epochs adapter: qlora lora_model_dir: sequence_len: 4096 sample_packing: false lora_r: 32 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: lora_target_linear: true lora_fan_in_fan_out: wandb_project: axolotl wandb_entity: wandb_watch: wandb_run_id: wandb_log_model: gradient_accumulation_steps: 4 micro_batch_size: 2 num_epochs: 10 optimizer: paged_adamw_32bit lr_scheduler: cosine learning_rate: 0.0002 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 100 eval_steps: 0.01 save_strategy: epoch save_steps: debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "" eos_token: "" unk_token: "" pad_token: "" # EOS와 PAD가 동일 ```

# kullm3_finetuning_test_4300QA_10epochs This model is a fine-tuned version of [nlpai-lab/KULLM3](https://huggingface.co/nlpai-lab/KULLM3) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.4754 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - num_epochs: 10 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 0.4883 | 0.01 | 1 | 0.3229 | | 0.4139 | 0.11 | 14 | 0.2783 | | 0.3475 | 0.21 | 28 | 0.2473 | | 0.3427 | 0.32 | 42 | 0.2353 | | 0.303 | 0.43 | 56 | 0.2297 | | 0.2902 | 0.53 | 70 | 0.2334 | | 0.288 | 0.64 | 84 | 0.2271 | | 0.2856 | 0.74 | 98 | 0.2233 | | 0.3035 | 0.85 | 112 | 0.2182 | | 0.2829 | 0.96 | 126 | 0.2161 | | 0.2986 | 1.06 | 140 | 0.2219 | | 0.2552 | 1.17 | 154 | 0.2269 | | 0.2489 | 1.28 | 168 | 0.2223 | | 0.2523 | 1.38 | 182 | 0.2248 | | 0.2481 | 1.49 | 196 | 0.2220 | | 0.235 | 1.59 | 210 | 0.2209 | | 0.2661 | 1.7 | 224 | 0.2165 | | 0.2522 | 1.81 | 238 | 0.2231 | | 0.2775 | 1.91 | 252 | 0.2190 | | 0.1825 | 2.02 | 266 | 0.2228 | | 0.1836 | 2.13 | 280 | 0.2331 | | 0.1655 | 2.23 | 294 | 0.2378 | | 0.1604 | 2.34 | 308 | 0.2376 | | 0.1766 | 2.44 | 322 | 0.2356 | | 0.1897 | 2.55 | 336 | 0.2344 | | 0.1756 | 2.66 | 350 | 0.2375 | | 0.1616 | 2.76 | 364 | 0.2387 | | 0.1436 | 2.87 | 378 | 0.2371 | | 0.166 | 2.98 | 392 | 0.2341 | | 0.0828 | 3.08 | 406 | 0.2602 | | 0.0893 | 3.19 | 420 | 0.2747 | | 0.079 | 3.29 | 434 | 0.2760 | | 0.0843 | 3.4 | 448 | 0.2780 | | 0.0815 | 3.51 | 462 | 0.2812 | | 0.0948 | 3.61 | 476 | 0.2828 | | 0.0845 | 3.72 | 490 | 0.2766 | | 0.1025 | 3.83 | 504 | 0.2772 | | 0.0763 | 3.93 | 518 | 0.2813 | | 0.0322 | 4.04 | 532 | 0.3309 | | 0.031 | 4.14 | 546 | 0.3221 | | 0.028 | 4.25 | 560 | 0.3348 | | 0.031 | 4.36 | 574 | 0.3374 | | 0.0309 | 4.46 | 588 | 0.3355 | | 0.0331 | 4.57 | 602 | 0.3344 | | 0.034 | 4.68 | 616 | 0.3384 | | 0.0324 | 4.78 | 630 | 0.3420 | | 0.0301 | 4.89 | 644 | 0.3350 | | 0.0327 | 4.99 | 658 | 0.3387 | | 0.0111 | 5.1 | 672 | 0.4010 | | 0.0089 | 5.21 | 686 | 0.3917 | | 0.0075 | 5.31 | 700 | 0.3925 | | 0.0106 | 5.42 | 714 | 0.3911 | | 0.0091 | 5.53 | 728 | 0.3937 | | 0.0109 | 5.63 | 742 | 0.3985 | | 0.009 | 5.74 | 756 | 0.4044 | | 0.0095 | 5.84 | 770 | 0.3949 | | 0.0075 | 5.95 | 784 | 0.3984 | | 0.0036 | 6.06 | 798 | 0.4133 | | 0.0031 | 6.16 | 812 | 0.4424 | | 0.0026 | 6.27 | 826 | 0.4525 | | 0.0034 | 6.38 | 840 | 0.4519 | | 0.0019 | 6.48 | 854 | 0.4513 | | 0.0018 | 6.59 | 868 | 0.4517 | | 0.0023 | 6.69 | 882 | 0.4520 | | 0.0016 | 6.8 | 896 | 0.4534 | | 0.0018 | 6.91 | 910 | 0.4528 | | 0.001 | 7.01 | 924 | 0.4537 | | 0.0011 | 7.12 | 938 | 0.4581 | | 0.0009 | 7.23 | 952 | 0.4631 | | 0.0009 | 7.33 | 966 | 0.4662 | | 0.0013 | 7.44 | 980 | 0.4680 | | 0.0008 | 7.54 | 994 | 0.4700 | | 0.001 | 7.65 | 1008 | 0.4711 | | 0.0009 | 7.76 | 1022 | 0.4720 | | 0.0011 | 7.86 | 1036 | 0.4727 | | 0.0009 | 7.97 | 1050 | 0.4731 | | 0.0011 | 8.08 | 1064 | 0.4735 | | 0.001 | 8.18 | 1078 | 0.4739 | | 0.001 | 8.29 | 1092 | 0.4741 | | 0.001 | 8.39 | 1106 | 0.4746 | | 0.0011 | 8.5 | 1120 | 0.4744 | | 0.0012 | 8.61 | 1134 | 0.4751 | | 0.0011 | 8.71 | 1148 | 0.4748 | | 0.001 | 8.82 | 1162 | 0.4747 | | 0.0009 | 8.93 | 1176 | 0.4754 | | 0.0011 | 9.03 | 1190 | 0.4752 | | 0.0013 | 9.14 | 1204 | 0.4751 | | 0.0009 | 9.24 | 1218 | 0.4749 | | 0.001 | 9.35 | 1232 | 0.4750 | | 0.0017 | 9.46 | 1246 | 0.4750 | | 0.0012 | 9.56 | 1260 | 0.4749 | | 0.0008 | 9.67 | 1274 | 0.4747 | | 0.0008 | 9.78 | 1288 | 0.4749 | | 0.0011 | 9.88 | 1302 | 0.4754 | ### Framework versions - PEFT 0.10.0 - Transformers 4.40.0.dev0 - Pytorch 2.1.2+cu121 - Datasets 2.15.0 - Tokenizers 0.15.2