--- license: apache-2.0 library_name: peft tags: - trl - sft - unsloth - generated_from_trainer base_model: unsloth/llama-3-8b-Instruct-bnb-4bit model-index: - name: llama3-chat_1M results: [] --- # llama3-chat_1M This model is a fine-tuned version of [unsloth/llama-3-8b-Instruct-bnb-4bit](https://huggingface.co/unsloth/llama-3-8b-Instruct-bnb-4bit) on the None dataset. It achieves the following results on the evaluation set: - Loss: 1.3835, 39.5 bleu on PhoMT test en-vi, 34.4 on IWSLT15 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 16 - eval_batch_size: 16 - seed: 3407 - gradient_accumulation_steps: 4 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 5 - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:-----:|:---------------:| | 1.6092 | 0.032 | 500 | 1.4727 | | 1.539 | 0.064 | 1000 | 1.4609 | | 1.5211 | 0.096 | 1500 | 1.4528 | | 1.5228 | 0.128 | 2000 | 1.4453 | | 1.5106 | 0.16 | 2500 | 1.4431 | | 1.5023 | 0.192 | 3000 | 1.4393 | | 1.506 | 0.224 | 3500 | 1.4377 | | 1.4887 | 0.256 | 4000 | 1.4342 | | 1.4942 | 0.288 | 4500 | 1.4334 | | 1.4826 | 0.32 | 5000 | 1.4307 | | 1.4895 | 0.352 | 5500 | 1.4269 | | 1.4854 | 0.384 | 6000 | 1.4249 | | 1.4799 | 0.416 | 6500 | 1.4246 | | 1.4837 | 0.448 | 7000 | 1.4227 | | 1.4766 | 0.48 | 7500 | 1.4223 | | 1.4799 | 0.512 | 8000 | 1.4206 | | 1.4728 | 0.544 | 8500 | 1.4177 | | 1.4753 | 0.576 | 9000 | 1.4173 | | 1.4705 | 0.608 | 9500 | 1.4153 | | 1.4679 | 0.64 | 10000 | 1.4159 | | 1.4646 | 0.672 | 10500 | 1.4163 | | 1.4601 | 0.704 | 11000 | 1.4135 | | 1.4648 | 0.736 | 11500 | 1.4113 | | 1.4618 | 0.768 | 12000 | 1.4109 | | 1.4644 | 0.8 | 12500 | 1.4096 | | 1.4593 | 0.832 | 13000 | 1.4084 | | 1.4629 | 0.864 | 13500 | 1.4080 | | 1.4565 | 0.896 | 14000 | 1.4079 | | 1.4502 | 0.928 | 14500 | 1.4043 | | 1.4558 | 0.96 | 15000 | 1.4024 | | 1.45 | 0.992 | 15500 | 1.4040 | | 1.3885 | 1.024 | 16000 | 1.4058 | | 1.3681 | 1.056 | 16500 | 1.4071 | | 1.3719 | 1.088 | 17000 | 1.4074 | | 1.3687 | 1.12 | 17500 | 1.4063 | | 1.3736 | 1.152 | 18000 | 1.4067 | | 1.3767 | 1.184 | 18500 | 1.4061 | | 1.3764 | 1.216 | 19000 | 1.4036 | | 1.3751 | 1.248 | 19500 | 1.4031 | | 1.3698 | 1.28 | 20000 | 1.4031 | | 1.3764 | 1.312 | 20500 | 1.4024 | | 1.379 | 1.3440 | 21000 | 1.4012 | | 1.3758 | 1.376 | 21500 | 1.3990 | | 1.3764 | 1.408 | 22000 | 1.3996 | | 1.3715 | 1.44 | 22500 | 1.3982 | | 1.3775 | 1.472 | 23000 | 1.3976 | | 1.3719 | 1.504 | 23500 | 1.3974 | | 1.3745 | 1.536 | 24000 | 1.3973 | | 1.3704 | 1.568 | 24500 | 1.3961 | | 1.3659 | 1.6 | 25000 | 1.3950 | | 1.3665 | 1.6320 | 25500 | 1.3947 | | 1.3628 | 1.6640 | 26000 | 1.3923 | | 1.367 | 1.696 | 26500 | 1.3915 | | 1.3616 | 1.728 | 27000 | 1.3899 | | 1.3671 | 1.76 | 27500 | 1.3891 | | 1.3651 | 1.792 | 28000 | 1.3884 | | 1.3609 | 1.8240 | 28500 | 1.3872 | | 1.3647 | 1.8560 | 29000 | 1.3871 | | 1.3595 | 1.888 | 29500 | 1.3852 | | 1.3579 | 1.92 | 30000 | 1.3845 | | 1.3575 | 1.952 | 30500 | 1.3837 | | 1.3576 | 1.984 | 31000 | 1.3835 | | 1.3102 | 2.016 | 31500 | 1.3964 | | 1.2595 | 2.048 | 32000 | 1.3966 | | 1.2622 | 2.08 | 32500 | 1.3978 | | 1.2606 | 2.112 | 33000 | 1.3967 | | 1.2665 | 2.144 | 33500 | 1.3982 | | 1.2658 | 2.176 | 34000 | 1.3974 | | 1.2574 | 2.208 | 34500 | 1.3971 | | 1.2584 | 2.24 | 35000 | 1.3963 | | 1.2635 | 2.2720 | 35500 | 1.3970 | | 1.2579 | 2.304 | 36000 | 1.3956 | | 1.2633 | 2.336 | 36500 | 1.3956 | | 1.2602 | 2.368 | 37000 | 1.3952 | | 1.2597 | 2.4 | 37500 | 1.3953 | | 1.2635 | 2.432 | 38000 | 1.3948 | | 1.2646 | 2.464 | 38500 | 1.3947 | | 1.2609 | 2.496 | 39000 | 1.3946 | | 1.2562 | 2.528 | 39500 | 1.3941 | | 1.2586 | 2.56 | 40000 | 1.3943 | | 1.2604 | 2.592 | 40500 | 1.3940 | | 1.2636 | 2.624 | 41000 | 1.3940 | | 1.2635 | 2.656 | 41500 | 1.3940 | | 1.2587 | 2.6880 | 42000 | 1.3938 | | 1.2603 | 2.7200 | 42500 | 1.3939 | | 1.2592 | 2.752 | 43000 | 1.3937 | | 1.2568 | 2.784 | 43500 | 1.3934 | | 1.2595 | 2.816 | 44000 | 1.3936 | | 1.2565 | 2.848 | 44500 | 1.3935 | | 1.2585 | 2.88 | 45000 | 1.3936 | | 1.2624 | 2.912 | 45500 | 1.3933 | | 1.2581 | 2.944 | 46000 | 1.3934 | | 1.2571 | 2.976 | 46500 | 1.3934 | ### Framework versions - PEFT 0.10.0 - Transformers 4.40.2 - Pytorch 2.3.0 - Datasets 2.19.1 - Tokenizers 0.19.1