Edit model card

mistral-dpo

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6944
  • Rewards/chosen: 0.2782
  • Rewards/rejected: 0.0543
  • Rewards/accuracies: 0.5385
  • Rewards/margins: 0.2239
  • Logps/rejected: -187.8588
  • Logps/chosen: -166.3796
  • Logits/rejected: -2.4215
  • Logits/chosen: -2.4790

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 250
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7027 0.0 10 0.6989 0.0816 0.0881 0.5577 -0.0065 -187.5204 -168.3459 -2.4271 -2.4774
0.6833 0.0 20 0.7017 -0.0375 -0.0327 0.5288 -0.0048 -188.7280 -169.5362 -2.4376 -2.4828
0.867 0.0 30 0.7193 -0.3147 -0.3086 0.5385 -0.0061 -191.4871 -172.3083 -2.4532 -2.4942
0.8962 0.0 40 0.7068 -0.2076 -0.2208 0.5577 0.0132 -190.6093 -171.2371 -2.4597 -2.5054
0.7467 0.0 50 0.7008 0.1918 0.1648 0.5577 0.0270 -186.7531 -167.2434 -2.4630 -2.5116
0.7335 0.0 60 0.6972 0.3949 0.3373 0.5385 0.0576 -185.0280 -165.2124 -2.4666 -2.5130
0.587 0.01 70 0.7116 0.6763 0.6193 0.4904 0.0570 -182.2083 -162.3980 -2.4675 -2.5126
0.675 0.01 80 0.7330 0.8676 0.8385 0.5096 0.0291 -180.0161 -160.4852 -2.4726 -2.5171
0.6117 0.01 90 0.7454 0.9576 0.9300 0.5192 0.0276 -179.1016 -159.5854 -2.4757 -2.5229
0.5697 0.01 100 0.7715 0.9933 0.9991 0.5 -0.0059 -178.4101 -159.2286 -2.4736 -2.5233
1.1319 0.01 110 0.7652 0.9034 0.8862 0.4904 0.0172 -179.5398 -160.1275 -2.4696 -2.5215
0.5912 0.01 120 0.7476 0.7562 0.7007 0.5096 0.0555 -181.3943 -161.5994 -2.4661 -2.5186
0.702 0.01 130 0.7400 0.7400 0.6590 0.5192 0.0810 -181.8113 -161.7616 -2.4642 -2.5211
0.5566 0.01 140 0.7332 0.6338 0.5293 0.5288 0.1044 -183.1082 -162.8238 -2.4650 -2.5222
0.7823 0.01 150 0.7327 0.5429 0.4408 0.5385 0.1022 -183.9939 -163.7323 -2.4645 -2.5191
0.7549 0.01 160 0.7282 0.3954 0.2907 0.5481 0.1047 -185.4949 -165.2079 -2.4612 -2.5138
0.6506 0.01 170 0.7262 0.3748 0.2716 0.5192 0.1031 -185.6850 -165.4137 -2.4579 -2.5102
0.559 0.01 180 0.7320 0.4578 0.3604 0.5096 0.0974 -184.7973 -164.5831 -2.4589 -2.5109
0.9496 0.02 190 0.7150 0.4227 0.2889 0.5192 0.1339 -185.5128 -164.9340 -2.4480 -2.5007
0.7996 0.02 200 0.7034 0.4051 0.2378 0.5288 0.1673 -186.0234 -165.1101 -2.4391 -2.4926
0.5733 0.02 210 0.6977 0.3946 0.2110 0.5288 0.1836 -186.2916 -165.2155 -2.4327 -2.4875
0.5796 0.02 220 0.6981 0.3933 0.1983 0.5288 0.1949 -186.4181 -165.2286 -2.4260 -2.4824
0.6435 0.02 230 0.6976 0.3726 0.1714 0.5288 0.2012 -186.6871 -165.4354 -2.4237 -2.4807
0.5993 0.02 240 0.6958 0.3088 0.0929 0.5385 0.2159 -187.4724 -166.0730 -2.4222 -2.4799
0.9077 0.02 250 0.6944 0.2782 0.0543 0.5385 0.2239 -187.8588 -166.3796 -2.4215 -2.4790

Framework versions

  • PEFT 0.8.2
  • Transformers 4.37.0
  • Pytorch 2.0.1+cu117
  • Datasets 2.15.0
  • Tokenizers 0.15.1
Downloads last month
1
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for abhiGOAT/mistral-dpo