Mistral2_1000_STEPS_01beta_1e7rate_CDPOSFT
This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5644
- Rewards/chosen: -0.8921
- Rewards/rejected: -1.9046
- Rewards/accuracies: 0.6330
- Rewards/margins: 1.0126
- Logps/rejected: -45.6030
- Logps/chosen: -32.5922
- Logits/rejected: -2.0662
- Logits/chosen: -2.0658
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6891 | 0.0977 | 50 | 0.6864 | 0.0309 | 0.0171 | 0.6220 | 0.0138 | -26.3858 | -23.3627 | -2.3067 | -2.3063 |
0.6478 | 0.1953 | 100 | 0.6468 | 0.0940 | -0.0114 | 0.6418 | 0.1054 | -26.6706 | -22.7312 | -2.2655 | -2.2651 |
0.5634 | 0.2930 | 150 | 0.5932 | 0.1716 | -0.1066 | 0.6593 | 0.2783 | -27.6232 | -21.9553 | -2.2109 | -2.2105 |
0.4312 | 0.3906 | 200 | 0.5617 | -0.0095 | -0.5043 | 0.6396 | 0.4948 | -31.6002 | -23.7667 | -2.1590 | -2.1586 |
0.4711 | 0.4883 | 250 | 0.5499 | -0.0116 | -0.6823 | 0.6462 | 0.6707 | -33.3801 | -23.7878 | -2.1283 | -2.1279 |
0.4014 | 0.5859 | 300 | 0.5465 | -0.4289 | -1.1546 | 0.6484 | 0.7257 | -38.1031 | -27.9606 | -2.1330 | -2.1326 |
0.4439 | 0.6836 | 350 | 0.5634 | -0.7104 | -1.5436 | 0.6462 | 0.8331 | -41.9925 | -30.7762 | -2.1022 | -2.1018 |
0.4768 | 0.7812 | 400 | 0.5594 | -0.6950 | -1.5434 | 0.6571 | 0.8484 | -41.9907 | -30.6215 | -2.1034 | -2.1030 |
0.4891 | 0.8789 | 450 | 0.5525 | -0.7222 | -1.5890 | 0.6505 | 0.8668 | -42.4472 | -30.8936 | -2.0946 | -2.0942 |
0.4048 | 0.9766 | 500 | 0.5463 | -0.4609 | -1.3226 | 0.6571 | 0.8617 | -39.7828 | -28.2802 | -2.1066 | -2.1062 |
0.3051 | 1.0742 | 550 | 0.5533 | -0.7106 | -1.6492 | 0.6440 | 0.9386 | -43.0491 | -30.7781 | -2.0836 | -2.0832 |
0.3145 | 1.1719 | 600 | 0.5586 | -0.8155 | -1.7726 | 0.6330 | 0.9571 | -44.2825 | -31.8265 | -2.0777 | -2.0774 |
0.4126 | 1.2695 | 650 | 0.5618 | -0.8660 | -1.8549 | 0.6374 | 0.9889 | -45.1055 | -32.3315 | -2.0720 | -2.0716 |
0.3106 | 1.3672 | 700 | 0.5631 | -0.8991 | -1.8960 | 0.6308 | 0.9969 | -45.5172 | -32.6628 | -2.0686 | -2.0682 |
0.3095 | 1.4648 | 750 | 0.5638 | -0.8960 | -1.9056 | 0.6308 | 1.0096 | -45.6128 | -32.6320 | -2.0670 | -2.0666 |
0.3638 | 1.5625 | 800 | 0.5660 | -0.8946 | -1.9044 | 0.6352 | 1.0098 | -45.6007 | -32.6176 | -2.0663 | -2.0659 |
0.348 | 1.6602 | 850 | 0.5645 | -0.8960 | -1.9094 | 0.6374 | 1.0134 | -45.6511 | -32.6320 | -2.0665 | -2.0661 |
0.3272 | 1.7578 | 900 | 0.5653 | -0.8971 | -1.9081 | 0.6352 | 1.0110 | -45.6377 | -32.6428 | -2.0662 | -2.0658 |
0.3261 | 1.8555 | 950 | 0.5644 | -0.8920 | -1.9045 | 0.6374 | 1.0124 | -45.6014 | -32.5920 | -2.0662 | -2.0659 |
0.2913 | 1.9531 | 1000 | 0.5644 | -0.8921 | -1.9046 | 0.6330 | 1.0126 | -45.6030 | -32.5922 | -2.0662 | -2.0658 |
Framework versions
- Transformers 4.40.1
- Pytorch 2.0.0+cu117
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 1