Edit model card

openhermes-mistral-2.5-7b-dpo-test

This model is a fine-tuned version of teknium/OpenHermes-2.5-Mistral-7B on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4487
  • Rewards/chosen: -0.2951
  • Rewards/rejected: -2.2421
  • Rewards/accuracies: 0.875
  • Rewards/margins: 1.9470
  • Logps/rejected: -257.4751
  • Logps/chosen: -204.3027
  • Logits/rejected: -3.0752
  • Logits/chosen: -3.0485

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 200

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.1645 0.01 10 0.5339 0.3993 -0.1483 0.6875 0.5476 -236.5374 -197.3593 -3.1575 -3.1872
0.0519 0.01 20 0.5521 0.2239 -0.4486 0.625 0.6725 -239.5405 -199.1127 -3.1969 -3.2456
0.1618 0.01 30 0.5866 -0.0538 -0.8893 0.5625 0.8355 -243.9472 -201.8902 -3.2286 -3.2525
0.1752 0.02 40 0.5943 -0.2184 -1.2057 0.5 0.9873 -247.1112 -203.5360 -3.2201 -3.2477
0.3811 0.03 50 0.6973 -0.6180 -1.8146 0.5 1.1966 -253.2001 -207.5316 -3.1943 -3.2034
1.158 0.03 60 0.6347 -0.4710 -1.7363 0.5625 1.2653 -252.4173 -206.0622 -3.1655 -3.1197
0.8751 0.04 70 0.6103 -0.4061 -1.5966 0.5625 1.1905 -251.0201 -205.4132 -3.1360 -3.0544
0.7811 0.04 80 0.6405 -0.4774 -1.6574 0.5625 1.1799 -251.6278 -206.1260 -3.1337 -3.0492
1.4305 0.04 90 0.6257 -0.4784 -1.6184 0.5625 1.1399 -251.2379 -206.1361 -3.1251 -3.0489
0.5478 0.05 100 0.6191 -0.5317 -1.7067 0.5625 1.1750 -252.1214 -206.6691 -3.1207 -3.0753
0.6344 0.06 110 0.5691 -0.4827 -1.7734 0.5625 1.2907 -252.7882 -206.1789 -3.1075 -3.0806
0.5405 0.06 120 0.5337 -0.4681 -2.1739 0.8125 1.7058 -256.7935 -206.0332 -3.1124 -3.0733
0.7848 0.07 130 0.5390 -0.5288 -2.3789 0.8125 1.8501 -258.8436 -206.6404 -3.1019 -3.0628
1.3119 0.07 140 0.4753 -0.3276 -2.0907 0.875 1.7631 -255.9614 -204.6279 -3.0904 -3.0648
0.3636 0.07 150 0.4555 -0.2566 -2.0064 0.625 1.7498 -255.1179 -203.9175 -3.0804 -3.0640
0.427 0.08 160 0.4614 -0.2900 -2.0804 0.625 1.7904 -255.8585 -204.2518 -3.0721 -3.0518
0.8971 0.09 170 0.4629 -0.3117 -2.1791 0.875 1.8673 -256.8448 -204.4694 -3.0711 -3.0468
0.6219 0.09 180 0.4560 -0.3042 -2.2114 0.875 1.9073 -257.1686 -204.3934 -3.0743 -3.0485
0.7551 0.1 190 0.4520 -0.3007 -2.2400 0.875 1.9392 -257.4540 -204.3593 -3.0755 -3.0481
1.0917 0.1 200 0.4487 -0.2951 -2.2421 0.875 1.9470 -257.4751 -204.3027 -3.0752 -3.0485

Framework versions

  • Transformers 4.34.1
  • Pytorch 2.1.0+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
0
Unable to determine this model's library. Check the docs .

Finetuned from