llama_SFT_e1_DPO_e3

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0745
  • Rewards/chosen: 0.4581
  • Rewards/rejected: -2.2850
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.7431
  • Logps/rejected: -231.3585
  • Logps/chosen: -178.1707
  • Logits/rejected: -1.0559
  • Logits/chosen: -0.8886

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6769 0.1 25 0.6597 0.0250 -0.0447 0.8267 0.0696 -208.9552 -182.5023 -1.0564 -0.8828
0.6046 0.2 50 0.5848 0.0656 -0.1679 0.9767 0.2335 -210.1874 -182.0954 -1.0566 -0.8835
0.5278 0.3 75 0.4947 0.1260 -0.3302 1.0 0.4561 -211.8100 -181.4921 -1.0569 -0.8842
0.4449 0.4 100 0.4127 0.1764 -0.5163 1.0 0.6927 -213.6715 -180.9882 -1.0574 -0.8860
0.3778 0.5 125 0.3368 0.2273 -0.7228 1.0 0.9502 -215.7366 -180.4786 -1.0575 -0.8860
0.2947 0.6 150 0.2704 0.2755 -0.9495 1.0 1.2250 -218.0034 -179.9972 -1.0567 -0.8893
0.2451 0.7 175 0.2164 0.3191 -1.1794 1.0 1.4985 -220.3023 -179.5607 -1.0570 -0.8899
0.1913 0.79 200 0.1737 0.3571 -1.4023 1.0 1.7594 -222.5310 -179.1809 -1.0573 -0.8904
0.1553 0.89 225 0.1412 0.3875 -1.6187 1.0 2.0062 -224.6958 -178.8769 -1.0564 -0.8903
0.1265 0.99 250 0.1173 0.4068 -1.8122 1.0 2.2190 -226.6304 -178.6835 -1.0564 -0.8882
0.1174 1.09 275 0.1029 0.4201 -1.9500 1.0 2.3701 -228.0080 -178.5508 -1.0555 -0.8881
0.0915 1.19 300 0.0938 0.4300 -2.0411 1.0 2.4711 -228.9190 -178.4516 -1.0555 -0.8891
0.0819 1.29 325 0.0880 0.4386 -2.1112 1.0 2.5497 -229.6201 -178.3662 -1.0554 -0.8884
0.09 1.39 350 0.0838 0.4485 -2.1597 1.0 2.6082 -230.1051 -178.2668 -1.0556 -0.8886
0.0865 1.49 375 0.0803 0.4551 -2.2012 1.0 2.6562 -230.5201 -178.2012 -1.0556 -0.8904
0.0731 1.59 400 0.0787 0.4544 -2.2264 1.0 2.6807 -230.7719 -178.2080 -1.0555 -0.8884
0.0734 1.69 425 0.0769 0.4580 -2.2470 1.0 2.7050 -230.9783 -178.1717 -1.0552 -0.8884
0.0827 1.79 450 0.0763 0.4591 -2.2582 1.0 2.7173 -231.0906 -178.1612 -1.0555 -0.8884
0.0784 1.89 475 0.0756 0.4545 -2.2709 1.0 2.7253 -231.2172 -178.2073 -1.0556 -0.8886
0.0768 1.99 500 0.0751 0.4564 -2.2804 1.0 2.7368 -231.3123 -178.1877 -1.0556 -0.8883
0.0696 2.09 525 0.0746 0.4607 -2.2824 1.0 2.7431 -231.3322 -178.1449 -1.0554 -0.8901
0.0691 2.19 550 0.0743 0.4597 -2.2852 1.0 2.7449 -231.3599 -178.1548 -1.0557 -0.8886
0.07 2.29 575 0.0747 0.4597 -2.2807 1.0 2.7404 -231.3157 -178.1549 -1.0559 -0.8889
0.0754 2.38 600 0.0744 0.4599 -2.2865 1.0 2.7463 -231.3729 -178.1530 -1.0555 -0.8903
0.0833 2.48 625 0.0743 0.4588 -2.2878 1.0 2.7466 -231.3862 -178.1637 -1.0557 -0.8887
0.0761 2.58 650 0.0746 0.4583 -2.2856 1.0 2.7440 -231.3646 -178.1684 -1.0557 -0.8889
0.0716 2.68 675 0.0745 0.4597 -2.2866 1.0 2.7463 -231.3745 -178.1546 -1.0558 -0.8905
0.0755 2.78 700 0.0746 0.4592 -2.2835 1.0 2.7426 -231.3430 -178.1601 -1.0560 -0.8889
0.063 2.88 725 0.0740 0.4618 -2.2897 1.0 2.7516 -231.4058 -178.1337 -1.0556 -0.8884
0.0743 2.98 750 0.0745 0.4581 -2.2850 1.0 2.7431 -231.3585 -178.1707 -1.0559 -0.8886

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
9
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for thorirhrafn/llama_SFT_e1_DPO_e3

Adapter
(1767)
this model