Edit model card

Visualize in Weights & Biases

Meta-Llama-3-8B-Instruct-DPO-QLoRA

This model is a fine-tuned version of data/Meta-Llama-3-8B-Instruct-Merged on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4785
  • Rewards/chosen: -2.3087
  • Rewards/rejected: -3.5097
  • Rewards/accuracies: 0.7760
  • Rewards/margins: 1.2010
  • Logps/rejected: -604.2300
  • Logps/chosen: -507.2661
  • Logits/rejected: -0.8568
  • Logits/chosen: -0.8381

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6865 0.0523 100 0.6857 0.0202 0.0036 0.6810 0.0166 -252.9014 -274.3707 -0.6048 -0.5953
0.5773 0.1047 200 0.5802 -0.5398 -0.9390 0.7080 0.3992 -347.1614 -330.3779 -0.9408 -0.9089
0.546 0.1570 300 0.5337 -0.9951 -1.7352 0.7370 0.7401 -426.7812 -375.9071 -1.0937 -1.0510
0.501 0.2094 400 0.5120 -1.8215 -2.7617 0.7530 0.9401 -529.4277 -458.5479 -1.1011 -1.0595
0.4525 0.2617 500 0.5090 -1.9857 -3.0848 0.7510 1.0991 -561.7446 -474.9624 -0.9430 -0.9134
0.508 0.3141 600 0.5005 -2.2106 -3.1511 0.7600 0.9405 -568.3763 -497.4550 -0.9955 -0.9626
0.4852 0.3664 700 0.5028 -1.3971 -2.4127 0.7770 1.0156 -494.5317 -416.1026 -0.9794 -0.9476
0.5474 0.4187 800 0.4966 -1.7948 -2.7637 0.7670 0.9689 -529.6284 -455.8714 -0.9115 -0.8851
0.5246 0.4711 900 0.4943 -1.5285 -2.5416 0.7660 1.0131 -507.4219 -429.2431 -0.8138 -0.7980
0.4635 0.5234 1000 0.4908 -2.8177 -4.0337 0.7630 1.2160 -656.6334 -558.1610 -0.8713 -0.8521
0.4856 0.5758 1100 0.4817 -2.3661 -3.4921 0.7720 1.1260 -602.4694 -512.9990 -0.8044 -0.7913
0.5013 0.6281 1200 0.4860 -2.1162 -3.2907 0.7720 1.1745 -582.3287 -488.0108 -0.7890 -0.7745
0.4497 0.6805 1300 0.4850 -2.4840 -3.7371 0.7730 1.2531 -626.9694 -524.7895 -0.8096 -0.7940
0.4734 0.7328 1400 0.4833 -2.1466 -3.3699 0.7740 1.2233 -590.2520 -491.0496 -0.8148 -0.7990
0.4482 0.7851 1500 0.4812 -2.5061 -3.7160 0.7760 1.2100 -624.8656 -527.0021 -0.8423 -0.8246
0.4982 0.8375 1600 0.4787 -2.2293 -3.3886 0.7770 1.1593 -592.1224 -499.3264 -0.8377 -0.8203
0.4594 0.8898 1700 0.4790 -2.3679 -3.5723 0.7730 1.2044 -610.4911 -513.1796 -0.8566 -0.8379
0.4551 0.9422 1800 0.4786 -2.3275 -3.5261 0.7730 1.1986 -605.8722 -509.1397 -0.8587 -0.8397
0.4605 0.9945 1900 0.4785 -2.3086 -3.5093 0.7740 1.2007 -604.1885 -507.2548 -0.8544 -0.8360

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Unable to determine this model’s pipeline type. Check the docs .
Invalid base_model specified in model card metadata. Needs to be a model id from hf.co/models.

Dataset used to train statking/Meta-Llama-3-8B-Instruct-DPO-QLoRA