Edit model card

llama3-chat_1M

This model is a fine-tuned version of unsloth/llama-3-8b-Instruct-bnb-4bit on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3835, 39.5 bleu on PhoMT test en-vi, 34.4 on IWSLT15

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 3407
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 5
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
1.6092 0.032 500 1.4727
1.539 0.064 1000 1.4609
1.5211 0.096 1500 1.4528
1.5228 0.128 2000 1.4453
1.5106 0.16 2500 1.4431
1.5023 0.192 3000 1.4393
1.506 0.224 3500 1.4377
1.4887 0.256 4000 1.4342
1.4942 0.288 4500 1.4334
1.4826 0.32 5000 1.4307
1.4895 0.352 5500 1.4269
1.4854 0.384 6000 1.4249
1.4799 0.416 6500 1.4246
1.4837 0.448 7000 1.4227
1.4766 0.48 7500 1.4223
1.4799 0.512 8000 1.4206
1.4728 0.544 8500 1.4177
1.4753 0.576 9000 1.4173
1.4705 0.608 9500 1.4153
1.4679 0.64 10000 1.4159
1.4646 0.672 10500 1.4163
1.4601 0.704 11000 1.4135
1.4648 0.736 11500 1.4113
1.4618 0.768 12000 1.4109
1.4644 0.8 12500 1.4096
1.4593 0.832 13000 1.4084
1.4629 0.864 13500 1.4080
1.4565 0.896 14000 1.4079
1.4502 0.928 14500 1.4043
1.4558 0.96 15000 1.4024
1.45 0.992 15500 1.4040
1.3885 1.024 16000 1.4058
1.3681 1.056 16500 1.4071
1.3719 1.088 17000 1.4074
1.3687 1.12 17500 1.4063
1.3736 1.152 18000 1.4067
1.3767 1.184 18500 1.4061
1.3764 1.216 19000 1.4036
1.3751 1.248 19500 1.4031
1.3698 1.28 20000 1.4031
1.3764 1.312 20500 1.4024
1.379 1.3440 21000 1.4012
1.3758 1.376 21500 1.3990
1.3764 1.408 22000 1.3996
1.3715 1.44 22500 1.3982
1.3775 1.472 23000 1.3976
1.3719 1.504 23500 1.3974
1.3745 1.536 24000 1.3973
1.3704 1.568 24500 1.3961
1.3659 1.6 25000 1.3950
1.3665 1.6320 25500 1.3947
1.3628 1.6640 26000 1.3923
1.367 1.696 26500 1.3915
1.3616 1.728 27000 1.3899
1.3671 1.76 27500 1.3891
1.3651 1.792 28000 1.3884
1.3609 1.8240 28500 1.3872
1.3647 1.8560 29000 1.3871
1.3595 1.888 29500 1.3852
1.3579 1.92 30000 1.3845
1.3575 1.952 30500 1.3837
1.3576 1.984 31000 1.3835
1.3102 2.016 31500 1.3964
1.2595 2.048 32000 1.3966
1.2622 2.08 32500 1.3978
1.2606 2.112 33000 1.3967
1.2665 2.144 33500 1.3982
1.2658 2.176 34000 1.3974
1.2574 2.208 34500 1.3971
1.2584 2.24 35000 1.3963
1.2635 2.2720 35500 1.3970
1.2579 2.304 36000 1.3956
1.2633 2.336 36500 1.3956
1.2602 2.368 37000 1.3952
1.2597 2.4 37500 1.3953
1.2635 2.432 38000 1.3948
1.2646 2.464 38500 1.3947
1.2609 2.496 39000 1.3946
1.2562 2.528 39500 1.3941
1.2586 2.56 40000 1.3943
1.2604 2.592 40500 1.3940
1.2636 2.624 41000 1.3940
1.2635 2.656 41500 1.3940
1.2587 2.6880 42000 1.3938
1.2603 2.7200 42500 1.3939
1.2592 2.752 43000 1.3937
1.2568 2.784 43500 1.3934
1.2595 2.816 44000 1.3936
1.2565 2.848 44500 1.3935
1.2585 2.88 45000 1.3936
1.2624 2.912 45500 1.3933
1.2581 2.944 46000 1.3934
1.2571 2.976 46500 1.3934

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.2
  • Pytorch 2.3.0
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for