Edit model card

cosmosDPO_testV0.4

This model is a fine-tuned version of ytu-ce-cosmos/turkish-gpt2-large-750m-instruct-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5326
  • Rewards/chosen: -1.6865
  • Rewards/rejected: -3.8720
  • Rewards/accuracies: 0.2621
  • Rewards/margins: 2.1855
  • Logps/rejected: -488.3978
  • Logps/chosen: -246.6120
  • Logits/rejected: -6.0396
  • Logits/chosen: -5.3865

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6924 0.0982 15 0.6904 -0.0080 -0.0136 0.2189 0.0056 -102.5626 -78.7603 -3.1573 -2.8913
0.6835 0.1964 30 0.6751 -0.0672 -0.1085 0.2125 0.0413 -112.0512 -84.6836 -3.5075 -3.2247
0.6499 0.2946 45 0.6457 -0.5115 -0.7205 0.2153 0.2090 -173.2494 -129.1067 -5.1912 -4.8499
0.6078 0.3928 60 0.6074 -1.1388 -1.7782 0.2309 0.6394 -279.0219 -191.8415 -5.6286 -5.1966
0.5726 0.4910 75 0.5695 -1.1349 -2.0506 0.2502 0.9156 -306.2599 -191.4553 -5.3966 -4.9327
0.5316 0.5892 90 0.5549 -1.1663 -2.3619 0.2548 1.1957 -337.3947 -194.5866 -5.3112 -4.8133
0.55 0.6874 105 0.5433 -1.1068 -2.4163 0.2621 1.3095 -342.8334 -188.6449 -5.2923 -4.7726
0.5189 0.7856 120 0.5389 -1.3233 -2.8925 0.2621 1.5692 -390.4512 -210.2947 -5.5446 -4.9930
0.4979 0.8838 135 0.5425 -1.7807 -3.7855 0.2603 2.0048 -479.7492 -256.0318 -5.7688 -5.1797
0.5419 0.9820 150 0.5415 -1.7964 -3.9426 0.2621 2.1462 -495.4600 -257.6046 -5.7611 -5.1532
0.5113 1.0802 165 0.5348 -1.6167 -3.6969 0.2621 2.0802 -470.8911 -239.6330 -5.9599 -5.3352
0.5003 1.1784 180 0.5428 -2.1645 -4.4068 0.2603 2.2423 -541.8832 -294.4119 -6.0441 -5.4280
0.5165 1.2766 195 0.5362 -1.8903 -4.1525 0.2612 2.2622 -516.4461 -266.9872 -6.0827 -5.4349
0.5267 1.3748 210 0.5359 -1.8482 -4.0699 0.2603 2.2216 -508.1883 -262.7859 -6.0075 -5.3648
0.501 1.4730 225 0.5358 -1.9003 -4.1818 0.2621 2.2815 -519.3844 -267.9934 -6.1419 -5.4825
0.515 1.5712 240 0.5340 -1.8152 -4.0625 0.2621 2.2473 -507.4503 -259.4838 -6.1424 -5.4824
0.5197 1.6694 255 0.5327 -1.7026 -3.9048 0.2621 2.2022 -491.6818 -248.2216 -6.0817 -5.4233
0.519 1.7676 270 0.5324 -1.6766 -3.8641 0.2621 2.1875 -487.6087 -245.6198 -6.0513 -5.3953
0.5331 1.8658 285 0.5325 -1.6847 -3.8703 0.2621 2.1856 -488.2263 -246.4283 -6.0390 -5.3858
0.5366 1.9640 300 0.5326 -1.6865 -3.8720 0.2621 2.1855 -488.3978 -246.6120 -6.0396 -5.3865

Framework versions

  • PEFT 0.10.0
  • Transformers 4.40.1
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Unable to determine this model’s pipeline type. Check the docs .

Adapter for