Mistral2_1000_STEPS_01beta_5e7rate_CDPOSFT
This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6952
- Rewards/chosen: -2.6191
- Rewards/rejected: -4.8703
- Rewards/accuracies: 0.6747
- Rewards/margins: 2.2512
- Logps/rejected: -75.2597
- Logps/chosen: -49.8627
- Logits/rejected: -1.6322
- Logits/chosen: -1.6328
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6269 | 0.0977 | 50 | 0.6140 | -0.3343 | -0.5430 | 0.6374 | 0.2087 | -31.9869 | -27.0148 | -2.2268 | -2.2264 |
0.5268 | 0.1953 | 100 | 0.5976 | -0.2761 | -1.1386 | 0.6286 | 0.8625 | -37.9425 | -26.4326 | -1.9212 | -1.9210 |
0.5506 | 0.2930 | 150 | 0.6146 | -1.6096 | -2.4758 | 0.6396 | 0.8663 | -51.3151 | -39.7672 | -1.9395 | -1.9394 |
0.5432 | 0.3906 | 200 | 0.5698 | -0.4241 | -1.1918 | 0.6505 | 0.7677 | -38.4747 | -27.9128 | -1.9285 | -1.9282 |
0.6505 | 0.4883 | 250 | 0.5601 | -0.1750 | -0.8692 | 0.6374 | 0.6942 | -35.2489 | -25.4218 | -2.0767 | -2.0768 |
0.4523 | 0.5859 | 300 | 0.5954 | -1.5615 | -2.5773 | 0.6659 | 1.0158 | -52.3301 | -39.2871 | -2.0915 | -2.0915 |
0.3741 | 0.6836 | 350 | 0.6019 | -1.3620 | -2.6572 | 0.6637 | 1.2953 | -53.1292 | -37.2912 | -1.9338 | -1.9339 |
0.4935 | 0.7812 | 400 | 0.5268 | -0.4724 | -1.6244 | 0.6725 | 1.1520 | -42.8010 | -28.3961 | -2.0924 | -2.0925 |
0.4814 | 0.8789 | 450 | 0.5435 | -0.9406 | -2.1449 | 0.6571 | 1.2043 | -48.0061 | -33.0774 | -1.7794 | -1.7797 |
0.4074 | 0.9766 | 500 | 0.5508 | -0.8357 | -2.0709 | 0.6659 | 1.2353 | -47.2661 | -32.0283 | -1.7302 | -1.7306 |
0.0931 | 1.0742 | 550 | 0.6341 | -1.8551 | -3.6519 | 0.6791 | 1.7969 | -63.0763 | -42.2222 | -1.4768 | -1.4775 |
0.0882 | 1.1719 | 600 | 0.6913 | -2.2849 | -4.2536 | 0.6659 | 1.9687 | -69.0926 | -46.5205 | -1.5867 | -1.5878 |
0.2295 | 1.2695 | 650 | 0.6905 | -2.8706 | -4.8698 | 0.6681 | 1.9992 | -75.2545 | -52.3774 | -1.6659 | -1.6665 |
0.1165 | 1.3672 | 700 | 0.6912 | -2.2721 | -4.4682 | 0.6703 | 2.1961 | -71.2390 | -46.3925 | -1.6307 | -1.6316 |
0.0517 | 1.4648 | 750 | 0.6863 | -2.3558 | -4.5939 | 0.6769 | 2.2380 | -72.4955 | -47.2299 | -1.6312 | -1.6318 |
0.1634 | 1.5625 | 800 | 0.6916 | -2.5241 | -4.7785 | 0.6747 | 2.2545 | -74.3421 | -48.9124 | -1.6318 | -1.6324 |
0.1488 | 1.6602 | 850 | 0.6950 | -2.5915 | -4.8400 | 0.6747 | 2.2486 | -74.9572 | -49.5864 | -1.6329 | -1.6335 |
0.1825 | 1.7578 | 900 | 0.6947 | -2.6155 | -4.8674 | 0.6703 | 2.2520 | -75.2313 | -49.8266 | -1.6327 | -1.6332 |
0.1616 | 1.8555 | 950 | 0.6952 | -2.6218 | -4.8678 | 0.6725 | 2.2460 | -75.2349 | -49.8902 | -1.6329 | -1.6335 |
0.1029 | 1.9531 | 1000 | 0.6952 | -2.6191 | -4.8703 | 0.6747 | 2.2512 | -75.2597 | -49.8627 | -1.6322 | -1.6328 |
Framework versions
- Transformers 4.40.1
- Pytorch 2.0.0+cu117
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 0