chat_1000_STEPS_03beta_1e7rate_CDPOSFT
This model is a fine-tuned version of tsavage68/chat_600STEPS_1e8rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6892
- Rewards/chosen: -0.0034
- Rewards/rejected: -0.0121
- Rewards/accuracies: 0.4725
- Rewards/margins: 0.0086
- Logps/rejected: -18.8422
- Logps/chosen: -16.7661
- Logits/rejected: -0.5986
- Logits/chosen: -0.5984
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6925 | 0.0977 | 50 | 0.6943 | 0.0013 | 0.0029 | 0.3846 | -0.0016 | -18.7922 | -16.7503 | -0.5979 | -0.5978 |
0.6919 | 0.1953 | 100 | 0.6932 | -0.0001 | -0.0007 | 0.4110 | 0.0005 | -18.8042 | -16.7551 | -0.5986 | -0.5985 |
0.6942 | 0.2930 | 150 | 0.6933 | -0.0039 | -0.0042 | 0.4176 | 0.0003 | -18.8160 | -16.7678 | -0.5979 | -0.5977 |
0.6964 | 0.3906 | 200 | 0.6932 | -0.0035 | -0.0040 | 0.4352 | 0.0005 | -18.8154 | -16.7662 | -0.5984 | -0.5983 |
0.6945 | 0.4883 | 250 | 0.6914 | -0.0028 | -0.0069 | 0.4505 | 0.0041 | -18.8249 | -16.7639 | -0.5977 | -0.5976 |
0.6906 | 0.5859 | 300 | 0.6920 | -0.0066 | -0.0096 | 0.4440 | 0.0031 | -18.8341 | -16.7765 | -0.5985 | -0.5984 |
0.6871 | 0.6836 | 350 | 0.6906 | -0.0055 | -0.0114 | 0.4440 | 0.0059 | -18.8400 | -16.7730 | -0.5982 | -0.5981 |
0.6889 | 0.7812 | 400 | 0.6897 | -0.0066 | -0.0143 | 0.4703 | 0.0076 | -18.8495 | -16.7768 | -0.5990 | -0.5989 |
0.689 | 0.8789 | 450 | 0.6905 | -0.0053 | -0.0115 | 0.4396 | 0.0063 | -18.8404 | -16.7722 | -0.5986 | -0.5984 |
0.6915 | 0.9766 | 500 | 0.6896 | -0.0031 | -0.0110 | 0.4681 | 0.0079 | -18.8388 | -16.7650 | -0.5990 | -0.5989 |
0.6834 | 1.0742 | 550 | 0.6906 | -0.0031 | -0.0091 | 0.4418 | 0.0060 | -18.8323 | -16.7650 | -0.5987 | -0.5986 |
0.683 | 1.1719 | 600 | 0.6894 | -0.0041 | -0.0125 | 0.4615 | 0.0084 | -18.8437 | -16.7683 | -0.5991 | -0.5990 |
0.6814 | 1.2695 | 650 | 0.6890 | -0.0031 | -0.0123 | 0.4681 | 0.0092 | -18.8430 | -16.7650 | -0.5992 | -0.5991 |
0.6811 | 1.3672 | 700 | 0.6895 | -0.0025 | -0.0108 | 0.4703 | 0.0083 | -18.8379 | -16.7630 | -0.5991 | -0.5989 |
0.6803 | 1.4648 | 750 | 0.6907 | -0.0024 | -0.0081 | 0.4242 | 0.0057 | -18.8289 | -16.7626 | -0.5983 | -0.5982 |
0.6836 | 1.5625 | 800 | 0.6911 | -0.0028 | -0.0078 | 0.4549 | 0.0050 | -18.8281 | -16.7640 | -0.5989 | -0.5987 |
0.6774 | 1.6602 | 850 | 0.6904 | -0.0039 | -0.0103 | 0.4484 | 0.0064 | -18.8363 | -16.7677 | -0.5988 | -0.5987 |
0.6866 | 1.7578 | 900 | 0.6875 | -0.0009 | -0.0130 | 0.4769 | 0.0121 | -18.8454 | -16.7576 | -0.5987 | -0.5986 |
0.6811 | 1.8555 | 950 | 0.6892 | -0.0034 | -0.0121 | 0.4725 | 0.0086 | -18.8422 | -16.7661 | -0.5986 | -0.5984 |
0.6812 | 1.9531 | 1000 | 0.6892 | -0.0034 | -0.0121 | 0.4725 | 0.0086 | -18.8422 | -16.7661 | -0.5986 | -0.5984 |
Framework versions
- Transformers 4.40.1
- Pytorch 2.0.0+cu117
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 0