mistralit2_1000_STEPS_1e7_rate_0.1_beta_DPO
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5956
- Rewards/chosen: -1.5853
- Rewards/rejected: -2.2163
- Rewards/accuracies: 0.6308
- Rewards/margins: 0.6310
- Logps/rejected: -50.7358
- Logps/chosen: -39.2390
- Logits/rejected: -2.7784
- Logits/chosen: -2.7790
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6877 | 0.1 | 50 | 0.6853 | -0.0549 | -0.0714 | 0.5846 | 0.0165 | -29.2866 | -23.9352 | -2.8632 | -2.8635 |
0.6422 | 0.2 | 100 | 0.6445 | -0.6597 | -0.8190 | 0.5758 | 0.1593 | -36.7627 | -29.9831 | -2.8277 | -2.8281 |
0.5395 | 0.29 | 150 | 0.6276 | -1.2920 | -1.6093 | 0.6000 | 0.3173 | -44.6654 | -36.3059 | -2.8059 | -2.8065 |
0.5656 | 0.39 | 200 | 0.6108 | -1.0994 | -1.4528 | 0.6000 | 0.3533 | -43.1002 | -34.3802 | -2.8123 | -2.8129 |
0.6317 | 0.49 | 250 | 0.5945 | -0.9164 | -1.3078 | 0.6176 | 0.3914 | -41.6506 | -32.5496 | -2.8115 | -2.8120 |
0.5648 | 0.59 | 300 | 0.6008 | -1.5057 | -1.9938 | 0.6198 | 0.4882 | -48.5106 | -38.4425 | -2.8042 | -2.8048 |
0.5332 | 0.68 | 350 | 0.6081 | -1.6363 | -2.1476 | 0.6154 | 0.5113 | -50.0487 | -39.7490 | -2.7837 | -2.7843 |
0.5852 | 0.78 | 400 | 0.5973 | -1.4834 | -2.0381 | 0.6330 | 0.5547 | -48.9534 | -38.2196 | -2.7903 | -2.7909 |
0.6266 | 0.88 | 450 | 0.5981 | -1.4246 | -1.9676 | 0.6374 | 0.5430 | -48.2480 | -37.6317 | -2.7910 | -2.7916 |
0.5184 | 0.98 | 500 | 0.5916 | -1.2806 | -1.8253 | 0.6286 | 0.5447 | -46.8255 | -36.1919 | -2.7922 | -2.7928 |
0.4736 | 1.07 | 550 | 0.5909 | -1.3628 | -1.9386 | 0.6330 | 0.5758 | -47.9585 | -37.0137 | -2.7867 | -2.7874 |
0.4708 | 1.17 | 600 | 0.5950 | -1.4680 | -2.0567 | 0.6330 | 0.5887 | -49.1391 | -38.0658 | -2.7836 | -2.7842 |
0.5232 | 1.27 | 650 | 0.5965 | -1.5438 | -2.1546 | 0.6308 | 0.6108 | -50.1188 | -38.8241 | -2.7804 | -2.7811 |
0.455 | 1.37 | 700 | 0.5976 | -1.5823 | -2.2032 | 0.6308 | 0.6209 | -50.6042 | -39.2085 | -2.7793 | -2.7799 |
0.4032 | 1.46 | 750 | 0.5958 | -1.5721 | -2.1999 | 0.6352 | 0.6278 | -50.5717 | -39.1070 | -2.7788 | -2.7795 |
0.4487 | 1.56 | 800 | 0.5957 | -1.5857 | -2.2157 | 0.6308 | 0.6300 | -50.7295 | -39.2429 | -2.7785 | -2.7791 |
0.5015 | 1.66 | 850 | 0.5972 | -1.5836 | -2.2125 | 0.6308 | 0.6289 | -50.6972 | -39.2220 | -2.7785 | -2.7791 |
0.419 | 1.76 | 900 | 0.5966 | -1.5861 | -2.2157 | 0.6308 | 0.6296 | -50.7298 | -39.2470 | -2.7783 | -2.7790 |
0.4581 | 1.86 | 950 | 0.5956 | -1.5849 | -2.2158 | 0.6308 | 0.6309 | -50.7304 | -39.2349 | -2.7784 | -2.7790 |
0.381 | 1.95 | 1000 | 0.5956 | -1.5853 | -2.2163 | 0.6308 | 0.6310 | -50.7358 | -39.2390 | -2.7784 | -2.7790 |
Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Model tree for tsavage68/mistralit2_1000_STEPS_1e7_rate_0.1_beta_DPO
Base model
mistralai/Mistral-7B-Instruct-v0.2