mistral-dpo

This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8911
  • Rewards/chosen: 0.5387
  • Rewards/rejected: 0.4878
  • Rewards/accuracies: 0.5096
  • Rewards/margins: 0.0509
  • Logps/rejected: -174.3804
  • Logps/chosen: -178.5185
  • Logits/rejected: -2.5028
  • Logits/chosen: -2.5350

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2
  • training_steps: 250
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6703 0.0 10 0.6842 -0.0001 -0.0268 0.5865 0.0267 -179.5257 -183.9063 -2.4290 -2.4720
0.7119 0.0 20 0.6751 0.1584 0.0990 0.5769 0.0594 -178.2678 -182.3211 -2.4542 -2.4988
0.647 0.0 30 0.6702 0.3569 0.2540 0.5769 0.1029 -176.7180 -180.3367 -2.4886 -2.5306
0.6748 0.0 40 0.6712 0.3439 0.2229 0.5288 0.1210 -177.0292 -180.4664 -2.5206 -2.5581
0.6513 0.0 50 0.6707 0.4403 0.2838 0.5577 0.1565 -176.4200 -179.5021 -2.5608 -2.5853
0.6103 0.0 60 0.6695 0.6831 0.4769 0.5577 0.2063 -174.4892 -177.0740 -2.5719 -2.5933
1.0313 0.01 70 0.6724 0.7062 0.5084 0.5577 0.1978 -174.1739 -176.8436 -2.5543 -2.5843
0.6876 0.01 80 0.6804 0.6995 0.5144 0.5385 0.1850 -174.1135 -176.9104 -2.5443 -2.5829
0.9661 0.01 90 0.6828 0.7118 0.5376 0.5385 0.1742 -173.8821 -176.7873 -2.5479 -2.5846
0.7354 0.01 100 0.6757 0.6765 0.5039 0.5577 0.1726 -174.2186 -177.1401 -2.5399 -2.5758
1.0127 0.01 110 0.7129 0.6089 0.4855 0.5288 0.1234 -174.4033 -177.8165 -2.5464 -2.5760
1.0366 0.01 120 0.7440 0.6068 0.4946 0.5481 0.1122 -174.3115 -177.8369 -2.5516 -2.5804
1.2145 0.01 130 0.7564 0.6521 0.5396 0.5673 0.1125 -173.8620 -177.3846 -2.5608 -2.5878
0.8342 0.01 140 0.7649 0.6639 0.5519 0.5385 0.1119 -173.7388 -177.2668 -2.5547 -2.5828
0.7402 0.01 150 0.7991 0.5831 0.4883 0.5 0.0948 -174.3747 -178.0745 -2.5498 -2.5775
0.7162 0.01 160 0.8396 0.6134 0.5474 0.5096 0.0659 -173.7835 -177.7718 -2.5445 -2.5713
0.9396 0.01 170 0.8573 0.5700 0.5144 0.5288 0.0556 -174.1144 -178.2057 -2.5326 -2.5629
0.5958 0.01 180 0.8708 0.5526 0.5017 0.5288 0.0509 -174.2406 -178.3789 -2.5227 -2.5540
0.7588 0.02 190 0.8865 0.5428 0.4977 0.5288 0.0450 -174.2806 -178.4775 -2.5207 -2.5493
0.7811 0.02 200 0.8933 0.5797 0.5429 0.5192 0.0368 -173.8286 -178.1080 -2.5171 -2.5434
0.5735 0.02 210 0.8907 0.5577 0.5174 0.5288 0.0403 -174.0838 -178.3279 -2.5069 -2.5366
0.7709 0.02 220 0.8886 0.5602 0.5167 0.5192 0.0435 -174.0907 -178.3035 -2.5041 -2.5361
0.4914 0.02 230 0.8884 0.5237 0.4766 0.5192 0.0471 -174.4924 -178.6684 -2.5050 -2.5375
0.739 0.02 240 0.8910 0.5281 0.4796 0.5192 0.0485 -174.4621 -178.6240 -2.5027 -2.5351
0.5743 0.02 250 0.8911 0.5387 0.4878 0.5096 0.0509 -174.3804 -178.5185 -2.5028 -2.5350

Framework versions

  • PEFT 0.7.1
  • Transformers 4.36.0
  • Pytorch 2.0.1+cu117
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for aritrasen/mistral-dpo