Edit model card

llama_SFT_e1_DPO_e2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1258
  • Rewards/chosen: 0.3605
  • Rewards/rejected: -1.7770
  • Rewards/accuracies: 1.0
  • Rewards/margins: 2.1375
  • Logps/rejected: -203.4181
  • Logps/chosen: -156.2596
  • Logits/rejected: -1.0532
  • Logits/chosen: -0.8665

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 7e-07
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6825 0.1 25 0.6596 0.0243 -0.0451 0.8667 0.0694 -186.0986 -159.6209 -1.0534 -0.8570
0.6018 0.2 50 0.5820 0.0671 -0.1728 0.9800 0.2399 -187.3757 -159.1936 -1.0531 -0.8568
0.5333 0.3 75 0.5021 0.1133 -0.3236 1.0 0.4369 -188.8834 -158.7311 -1.0544 -0.8586
0.4522 0.4 100 0.4213 0.1615 -0.5029 1.0 0.6644 -190.6768 -158.2497 -1.0547 -0.8596
0.3962 0.5 125 0.3555 0.1988 -0.6844 1.0 0.8832 -192.4913 -157.8759 -1.0548 -0.8608
0.3164 0.6 150 0.2920 0.2416 -0.8872 1.0 1.1288 -194.5195 -157.4483 -1.0550 -0.8660
0.2673 0.7 175 0.2400 0.2789 -1.0936 1.0 1.3725 -196.5838 -157.0758 -1.0540 -0.8656
0.217 0.79 200 0.2008 0.3028 -1.2873 1.0 1.5900 -198.5201 -156.8367 -1.0540 -0.8668
0.1822 0.89 225 0.1694 0.3294 -1.4600 1.0 1.7894 -200.2475 -156.5703 -1.0541 -0.8674
0.1578 0.99 250 0.1483 0.3436 -1.6056 1.0 1.9492 -201.7036 -156.4280 -1.0538 -0.8668
0.1509 1.09 275 0.1364 0.3512 -1.6903 1.0 2.0414 -202.5503 -156.3527 -1.0534 -0.8666
0.1273 1.19 300 0.1322 0.3561 -1.7242 1.0 2.0804 -202.8900 -156.3031 -1.0532 -0.8657
0.1208 1.29 325 0.1284 0.3561 -1.7546 1.0 2.1106 -203.1934 -156.3038 -1.0534 -0.8668
0.1325 1.39 350 0.1270 0.3598 -1.7654 1.0 2.1252 -203.3020 -156.2663 -1.0532 -0.8665
0.1287 1.49 375 0.1263 0.3618 -1.7718 1.0 2.1336 -203.3654 -156.2462 -1.0534 -0.8666
0.1203 1.59 400 0.1252 0.3624 -1.7783 1.0 2.1407 -203.4305 -156.2402 -1.0532 -0.8666
0.1188 1.69 425 0.1254 0.3610 -1.7767 1.0 2.1377 -203.4145 -156.2542 -1.0530 -0.8664
0.1331 1.79 450 0.1253 0.3640 -1.7760 1.0 2.1400 -203.4073 -156.2242 -1.0531 -0.8662
0.1301 1.89 475 0.1252 0.3641 -1.7772 1.0 2.1413 -203.4194 -156.2230 -1.0531 -0.8667
0.1289 1.99 500 0.1258 0.3605 -1.7770 1.0 2.1375 -203.4181 -156.2596 -1.0532 -0.8665

Framework versions

  • PEFT 0.8.2
  • Transformers 4.38.1
  • Pytorch 2.2.0+cu118
  • Datasets 2.17.1
  • Tokenizers 0.15.2
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for thorirhrafn/llama_SFT_e1_DPO_e2

Adapter
(1063)
this model