dpo

This model is a fine-tuned version of unsloth/llama-3-8b-Instruct-bnb-4bit on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6257
  • Rewards/chosen: 0.8141
  • Rewards/rejected: 0.4945
  • Rewards/accuracies: 0.6431
  • Rewards/margins: 0.3196
  • Logps/rejected: -229.7856
  • Logps/chosen: -249.2073
  • Logits/rejected: -0.6789
  • Logits/chosen: -0.6135

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 0
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 750
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6904 0.0372 28 0.6811 0.2766 0.2476 0.5770 0.0290 -232.2545 -254.5816 -0.5471 -0.5010
0.6591 0.0745 56 0.6623 0.9939 0.8694 0.5927 0.1245 -226.0365 -247.4085 -0.5351 -0.4798
0.6297 0.1117 84 0.6542 1.1966 0.9862 0.6136 0.2104 -224.8689 -245.3818 -0.4689 -0.4120
0.5985 0.1489 112 0.6540 1.5211 1.2525 0.6087 0.2687 -222.2059 -242.1367 -0.4989 -0.4262
0.6603 0.1862 140 0.6459 0.7737 0.5130 0.6304 0.2607 -229.6009 -249.6110 -0.5779 -0.5054
0.619 0.2234 168 0.6411 0.9352 0.6917 0.6222 0.2435 -227.8137 -247.9963 -0.5842 -0.5261
0.6497 0.2606 196 0.6427 0.8696 0.6404 0.6282 0.2292 -228.3268 -248.6518 -0.5798 -0.5255
0.6014 0.2979 224 0.6397 0.8941 0.6357 0.6263 0.2583 -228.3730 -248.4069 -0.6397 -0.5816
0.594 0.3351 252 0.6361 0.7069 0.4027 0.6319 0.3043 -230.7038 -250.2785 -0.6434 -0.5848
0.5898 0.3723 280 0.6356 1.0373 0.7462 0.6278 0.2911 -227.2686 -246.9745 -0.6340 -0.5714
0.639 0.4096 308 0.6342 0.7199 0.4321 0.6342 0.2878 -230.4095 -250.1490 -0.6956 -0.6293
0.6289 0.4468 336 0.6363 0.4299 0.1879 0.6248 0.2420 -232.8515 -253.0488 -0.6705 -0.6155
0.6304 0.4840 364 0.6321 0.7719 0.5053 0.6435 0.2667 -229.6779 -249.6284 -0.6279 -0.5652
0.6126 0.5213 392 0.6325 0.5194 0.2033 0.6375 0.3161 -232.6973 -252.1539 -0.6785 -0.6117
0.5974 0.5585 420 0.6254 0.7418 0.4269 0.6428 0.3149 -230.4618 -249.9303 -0.6823 -0.6170
0.6185 0.5957 448 0.6267 0.9534 0.6106 0.6409 0.3428 -228.6247 -247.8141 -0.6532 -0.5866
0.604 0.6330 476 0.6284 0.8011 0.4691 0.6394 0.3320 -230.0398 -249.3374 -0.6842 -0.6177
0.6154 0.6702 504 0.6269 0.8353 0.5307 0.6431 0.3046 -229.4234 -248.9947 -0.6705 -0.6051
0.5936 0.7074 532 0.6277 0.7287 0.4206 0.6469 0.3082 -230.5248 -250.0604 -0.6887 -0.6226
0.6291 0.7447 560 0.6260 0.8539 0.5327 0.6439 0.3211 -229.4030 -248.8091 -0.6758 -0.6096
0.6169 0.7819 588 0.6255 0.8797 0.5669 0.6461 0.3127 -229.0613 -248.5513 -0.6690 -0.6041
0.5934 0.8191 616 0.6256 0.8582 0.5399 0.6461 0.3183 -229.3312 -248.7658 -0.6753 -0.6095
0.6004 0.8564 644 0.6257 0.8263 0.5074 0.6450 0.3189 -229.6564 -249.0845 -0.6761 -0.6110
0.6282 0.8936 672 0.6256 0.8133 0.4949 0.6442 0.3184 -229.7819 -249.2152 -0.6748 -0.6101
0.5572 0.9309 700 0.6258 0.8122 0.4938 0.6442 0.3184 -229.7925 -249.2255 -0.6781 -0.6129
0.595 0.9681 728 0.6256 0.8140 0.4943 0.6428 0.3197 -229.7873 -249.2078 -0.6788 -0.6134

Framework versions

  • PEFT 0.11.1
  • Transformers 4.41.2
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.2
  • Tokenizers 0.19.1
Downloads last month
2
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for narekvslife/quantized

Adapter
(69)
this model