mistral-dpo
This model is a fine-tuned version of TheBloke/OpenHermes-2-Mistral-7B-GPTQ on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6944
- Rewards/chosen: 0.2782
- Rewards/rejected: 0.0543
- Rewards/accuracies: 0.5385
- Rewards/margins: 0.2239
- Logps/rejected: -187.8588
- Logps/chosen: -166.3796
- Logits/rejected: -2.4215
- Logits/chosen: -2.4790
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 250
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.7027 | 0.0 | 10 | 0.6989 | 0.0816 | 0.0881 | 0.5577 | -0.0065 | -187.5204 | -168.3459 | -2.4271 | -2.4774 |
0.6833 | 0.0 | 20 | 0.7017 | -0.0375 | -0.0327 | 0.5288 | -0.0048 | -188.7280 | -169.5362 | -2.4376 | -2.4828 |
0.867 | 0.0 | 30 | 0.7193 | -0.3147 | -0.3086 | 0.5385 | -0.0061 | -191.4871 | -172.3083 | -2.4532 | -2.4942 |
0.8962 | 0.0 | 40 | 0.7068 | -0.2076 | -0.2208 | 0.5577 | 0.0132 | -190.6093 | -171.2371 | -2.4597 | -2.5054 |
0.7467 | 0.0 | 50 | 0.7008 | 0.1918 | 0.1648 | 0.5577 | 0.0270 | -186.7531 | -167.2434 | -2.4630 | -2.5116 |
0.7335 | 0.0 | 60 | 0.6972 | 0.3949 | 0.3373 | 0.5385 | 0.0576 | -185.0280 | -165.2124 | -2.4666 | -2.5130 |
0.587 | 0.01 | 70 | 0.7116 | 0.6763 | 0.6193 | 0.4904 | 0.0570 | -182.2083 | -162.3980 | -2.4675 | -2.5126 |
0.675 | 0.01 | 80 | 0.7330 | 0.8676 | 0.8385 | 0.5096 | 0.0291 | -180.0161 | -160.4852 | -2.4726 | -2.5171 |
0.6117 | 0.01 | 90 | 0.7454 | 0.9576 | 0.9300 | 0.5192 | 0.0276 | -179.1016 | -159.5854 | -2.4757 | -2.5229 |
0.5697 | 0.01 | 100 | 0.7715 | 0.9933 | 0.9991 | 0.5 | -0.0059 | -178.4101 | -159.2286 | -2.4736 | -2.5233 |
1.1319 | 0.01 | 110 | 0.7652 | 0.9034 | 0.8862 | 0.4904 | 0.0172 | -179.5398 | -160.1275 | -2.4696 | -2.5215 |
0.5912 | 0.01 | 120 | 0.7476 | 0.7562 | 0.7007 | 0.5096 | 0.0555 | -181.3943 | -161.5994 | -2.4661 | -2.5186 |
0.702 | 0.01 | 130 | 0.7400 | 0.7400 | 0.6590 | 0.5192 | 0.0810 | -181.8113 | -161.7616 | -2.4642 | -2.5211 |
0.5566 | 0.01 | 140 | 0.7332 | 0.6338 | 0.5293 | 0.5288 | 0.1044 | -183.1082 | -162.8238 | -2.4650 | -2.5222 |
0.7823 | 0.01 | 150 | 0.7327 | 0.5429 | 0.4408 | 0.5385 | 0.1022 | -183.9939 | -163.7323 | -2.4645 | -2.5191 |
0.7549 | 0.01 | 160 | 0.7282 | 0.3954 | 0.2907 | 0.5481 | 0.1047 | -185.4949 | -165.2079 | -2.4612 | -2.5138 |
0.6506 | 0.01 | 170 | 0.7262 | 0.3748 | 0.2716 | 0.5192 | 0.1031 | -185.6850 | -165.4137 | -2.4579 | -2.5102 |
0.559 | 0.01 | 180 | 0.7320 | 0.4578 | 0.3604 | 0.5096 | 0.0974 | -184.7973 | -164.5831 | -2.4589 | -2.5109 |
0.9496 | 0.02 | 190 | 0.7150 | 0.4227 | 0.2889 | 0.5192 | 0.1339 | -185.5128 | -164.9340 | -2.4480 | -2.5007 |
0.7996 | 0.02 | 200 | 0.7034 | 0.4051 | 0.2378 | 0.5288 | 0.1673 | -186.0234 | -165.1101 | -2.4391 | -2.4926 |
0.5733 | 0.02 | 210 | 0.6977 | 0.3946 | 0.2110 | 0.5288 | 0.1836 | -186.2916 | -165.2155 | -2.4327 | -2.4875 |
0.5796 | 0.02 | 220 | 0.6981 | 0.3933 | 0.1983 | 0.5288 | 0.1949 | -186.4181 | -165.2286 | -2.4260 | -2.4824 |
0.6435 | 0.02 | 230 | 0.6976 | 0.3726 | 0.1714 | 0.5288 | 0.2012 | -186.6871 | -165.4354 | -2.4237 | -2.4807 |
0.5993 | 0.02 | 240 | 0.6958 | 0.3088 | 0.0929 | 0.5385 | 0.2159 | -187.4724 | -166.0730 | -2.4222 | -2.4799 |
0.9077 | 0.02 | 250 | 0.6944 | 0.2782 | 0.0543 | 0.5385 | 0.2239 | -187.8588 | -166.3796 | -2.4215 | -2.4790 |
Framework versions
- PEFT 0.8.2
- Transformers 4.37.0
- Pytorch 2.0.1+cu117
- Datasets 2.15.0
- Tokenizers 0.15.1
- Downloads last month
- 1
Model tree for abhiGOAT/mistral-dpo
Base model
mistralai/Mistral-7B-v0.1
Finetuned
teknium/OpenHermes-2-Mistral-7B
Quantized
TheBloke/OpenHermes-2-Mistral-7B-GPTQ