cosmosDPO_testV0.4
This model is a fine-tuned version of ytu-ce-cosmos/turkish-gpt2-large-750m-instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5326
- Rewards/chosen: -1.6865
- Rewards/rejected: -3.8720
- Rewards/accuracies: 0.2621
- Rewards/margins: 2.1855
- Logps/rejected: -488.3978
- Logps/chosen: -246.6120
- Logits/rejected: -6.0396
- Logits/chosen: -5.3865
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6924 | 0.0982 | 15 | 0.6904 | -0.0080 | -0.0136 | 0.2189 | 0.0056 | -102.5626 | -78.7603 | -3.1573 | -2.8913 |
0.6835 | 0.1964 | 30 | 0.6751 | -0.0672 | -0.1085 | 0.2125 | 0.0413 | -112.0512 | -84.6836 | -3.5075 | -3.2247 |
0.6499 | 0.2946 | 45 | 0.6457 | -0.5115 | -0.7205 | 0.2153 | 0.2090 | -173.2494 | -129.1067 | -5.1912 | -4.8499 |
0.6078 | 0.3928 | 60 | 0.6074 | -1.1388 | -1.7782 | 0.2309 | 0.6394 | -279.0219 | -191.8415 | -5.6286 | -5.1966 |
0.5726 | 0.4910 | 75 | 0.5695 | -1.1349 | -2.0506 | 0.2502 | 0.9156 | -306.2599 | -191.4553 | -5.3966 | -4.9327 |
0.5316 | 0.5892 | 90 | 0.5549 | -1.1663 | -2.3619 | 0.2548 | 1.1957 | -337.3947 | -194.5866 | -5.3112 | -4.8133 |
0.55 | 0.6874 | 105 | 0.5433 | -1.1068 | -2.4163 | 0.2621 | 1.3095 | -342.8334 | -188.6449 | -5.2923 | -4.7726 |
0.5189 | 0.7856 | 120 | 0.5389 | -1.3233 | -2.8925 | 0.2621 | 1.5692 | -390.4512 | -210.2947 | -5.5446 | -4.9930 |
0.4979 | 0.8838 | 135 | 0.5425 | -1.7807 | -3.7855 | 0.2603 | 2.0048 | -479.7492 | -256.0318 | -5.7688 | -5.1797 |
0.5419 | 0.9820 | 150 | 0.5415 | -1.7964 | -3.9426 | 0.2621 | 2.1462 | -495.4600 | -257.6046 | -5.7611 | -5.1532 |
0.5113 | 1.0802 | 165 | 0.5348 | -1.6167 | -3.6969 | 0.2621 | 2.0802 | -470.8911 | -239.6330 | -5.9599 | -5.3352 |
0.5003 | 1.1784 | 180 | 0.5428 | -2.1645 | -4.4068 | 0.2603 | 2.2423 | -541.8832 | -294.4119 | -6.0441 | -5.4280 |
0.5165 | 1.2766 | 195 | 0.5362 | -1.8903 | -4.1525 | 0.2612 | 2.2622 | -516.4461 | -266.9872 | -6.0827 | -5.4349 |
0.5267 | 1.3748 | 210 | 0.5359 | -1.8482 | -4.0699 | 0.2603 | 2.2216 | -508.1883 | -262.7859 | -6.0075 | -5.3648 |
0.501 | 1.4730 | 225 | 0.5358 | -1.9003 | -4.1818 | 0.2621 | 2.2815 | -519.3844 | -267.9934 | -6.1419 | -5.4825 |
0.515 | 1.5712 | 240 | 0.5340 | -1.8152 | -4.0625 | 0.2621 | 2.2473 | -507.4503 | -259.4838 | -6.1424 | -5.4824 |
0.5197 | 1.6694 | 255 | 0.5327 | -1.7026 | -3.9048 | 0.2621 | 2.2022 | -491.6818 | -248.2216 | -6.0817 | -5.4233 |
0.519 | 1.7676 | 270 | 0.5324 | -1.6766 | -3.8641 | 0.2621 | 2.1875 | -487.6087 | -245.6198 | -6.0513 | -5.3953 |
0.5331 | 1.8658 | 285 | 0.5325 | -1.6847 | -3.8703 | 0.2621 | 2.1856 | -488.2263 | -246.4283 | -6.0390 | -5.3858 |
0.5366 | 1.9640 | 300 | 0.5326 | -1.6865 | -3.8720 | 0.2621 | 2.1855 | -488.3978 | -246.6120 | -6.0396 | -5.3865 |
Framework versions
- PEFT 0.10.0
- Transformers 4.40.1
- Pytorch 2.2.1+cu121
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Unable to determine this model’s pipeline type. Check the
docs
.