chat_1000_STEPS_05beta_5e7rate_CDPOSFT
This model is a fine-tuned version of tsavage68/chat_600STEPS_1e8rate_SFT on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.6616
- Rewards/chosen: -0.1436
- Rewards/rejected: -0.2746
- Rewards/accuracies: 0.5121
- Rewards/margins: 0.1310
- Logps/rejected: -19.3513
- Logps/chosen: -17.0419
- Logits/rejected: -0.6146
- Logits/chosen: -0.6144
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6903 | 0.0977 | 50 | 0.6936 | 0.0166 | 0.0155 | 0.4000 | 0.0011 | -18.7710 | -16.7214 | -0.5983 | -0.5982 |
0.6671 | 0.1953 | 100 | 0.6792 | -0.0508 | -0.0879 | 0.4835 | 0.0371 | -18.9777 | -16.8562 | -0.6007 | -0.6006 |
0.6959 | 0.2930 | 150 | 0.6832 | -0.1265 | -0.1680 | 0.4835 | 0.0414 | -19.1379 | -17.0077 | -0.6015 | -0.6014 |
0.6846 | 0.3906 | 200 | 0.6802 | -0.0532 | -0.1115 | 0.4945 | 0.0582 | -19.0249 | -16.8611 | -0.5963 | -0.5961 |
0.7093 | 0.4883 | 250 | 0.6785 | -0.0329 | -0.1015 | 0.5055 | 0.0686 | -19.0051 | -16.8204 | -0.5935 | -0.5934 |
0.6806 | 0.5859 | 300 | 0.6692 | -0.0525 | -0.1502 | 0.5319 | 0.0977 | -19.1024 | -16.8596 | -0.6013 | -0.6012 |
0.6602 | 0.6836 | 350 | 0.6687 | -0.1217 | -0.2201 | 0.5055 | 0.0984 | -19.2423 | -16.9981 | -0.5956 | -0.5955 |
0.6623 | 0.7812 | 400 | 0.6638 | -0.0882 | -0.2063 | 0.5187 | 0.1181 | -19.2146 | -16.9310 | -0.6041 | -0.6040 |
0.68 | 0.8789 | 450 | 0.6676 | -0.0466 | -0.1563 | 0.5033 | 0.1096 | -19.1145 | -16.8479 | -0.5958 | -0.5956 |
0.6566 | 0.9766 | 500 | 0.6673 | -0.0526 | -0.1670 | 0.5209 | 0.1143 | -19.1359 | -16.8599 | -0.6025 | -0.6024 |
0.4534 | 1.0742 | 550 | 0.6642 | -0.0606 | -0.1820 | 0.5165 | 0.1214 | -19.1661 | -16.8759 | -0.6045 | -0.6043 |
0.4636 | 1.1719 | 600 | 0.6618 | -0.1037 | -0.2295 | 0.5187 | 0.1259 | -19.2611 | -16.9619 | -0.6071 | -0.6070 |
0.4729 | 1.2695 | 650 | 0.6600 | -0.1190 | -0.2504 | 0.5231 | 0.1314 | -19.3028 | -16.9927 | -0.6106 | -0.6105 |
0.4057 | 1.3672 | 700 | 0.6601 | -0.1176 | -0.2495 | 0.5297 | 0.1320 | -19.3011 | -16.9898 | -0.6115 | -0.6114 |
0.3873 | 1.4648 | 750 | 0.6601 | -0.1335 | -0.2670 | 0.5187 | 0.1335 | -19.3359 | -17.0216 | -0.6135 | -0.6133 |
0.4769 | 1.5625 | 800 | 0.6603 | -0.1398 | -0.2738 | 0.5165 | 0.1339 | -19.3495 | -17.0343 | -0.6136 | -0.6134 |
0.4437 | 1.6602 | 850 | 0.6558 | -0.1370 | -0.2785 | 0.5187 | 0.1415 | -19.3589 | -17.0286 | -0.6142 | -0.6140 |
0.4781 | 1.7578 | 900 | 0.6587 | -0.1393 | -0.2752 | 0.5209 | 0.1359 | -19.3524 | -17.0332 | -0.6146 | -0.6145 |
0.4408 | 1.8555 | 950 | 0.6611 | -0.1424 | -0.2727 | 0.5121 | 0.1303 | -19.3474 | -17.0395 | -0.6146 | -0.6145 |
0.4387 | 1.9531 | 1000 | 0.6616 | -0.1436 | -0.2746 | 0.5121 | 0.1310 | -19.3513 | -17.0419 | -0.6146 | -0.6144 |
Framework versions
- Transformers 4.40.1
- Pytorch 2.0.0+cu117
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 1