dpo
This model is a fine-tuned version of unsloth/llama-3-8b-Instruct-bnb-4bit on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6257
- Rewards/chosen: 0.8141
- Rewards/rejected: 0.4945
- Rewards/accuracies: 0.6431
- Rewards/margins: 0.3196
- Logps/rejected: -229.7856
- Logps/chosen: -249.2073
- Logits/rejected: -0.6789
- Logits/chosen: -0.6135
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 0
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 750
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6904 | 0.0372 | 28 | 0.6811 | 0.2766 | 0.2476 | 0.5770 | 0.0290 | -232.2545 | -254.5816 | -0.5471 | -0.5010 |
0.6591 | 0.0745 | 56 | 0.6623 | 0.9939 | 0.8694 | 0.5927 | 0.1245 | -226.0365 | -247.4085 | -0.5351 | -0.4798 |
0.6297 | 0.1117 | 84 | 0.6542 | 1.1966 | 0.9862 | 0.6136 | 0.2104 | -224.8689 | -245.3818 | -0.4689 | -0.4120 |
0.5985 | 0.1489 | 112 | 0.6540 | 1.5211 | 1.2525 | 0.6087 | 0.2687 | -222.2059 | -242.1367 | -0.4989 | -0.4262 |
0.6603 | 0.1862 | 140 | 0.6459 | 0.7737 | 0.5130 | 0.6304 | 0.2607 | -229.6009 | -249.6110 | -0.5779 | -0.5054 |
0.619 | 0.2234 | 168 | 0.6411 | 0.9352 | 0.6917 | 0.6222 | 0.2435 | -227.8137 | -247.9963 | -0.5842 | -0.5261 |
0.6497 | 0.2606 | 196 | 0.6427 | 0.8696 | 0.6404 | 0.6282 | 0.2292 | -228.3268 | -248.6518 | -0.5798 | -0.5255 |
0.6014 | 0.2979 | 224 | 0.6397 | 0.8941 | 0.6357 | 0.6263 | 0.2583 | -228.3730 | -248.4069 | -0.6397 | -0.5816 |
0.594 | 0.3351 | 252 | 0.6361 | 0.7069 | 0.4027 | 0.6319 | 0.3043 | -230.7038 | -250.2785 | -0.6434 | -0.5848 |
0.5898 | 0.3723 | 280 | 0.6356 | 1.0373 | 0.7462 | 0.6278 | 0.2911 | -227.2686 | -246.9745 | -0.6340 | -0.5714 |
0.639 | 0.4096 | 308 | 0.6342 | 0.7199 | 0.4321 | 0.6342 | 0.2878 | -230.4095 | -250.1490 | -0.6956 | -0.6293 |
0.6289 | 0.4468 | 336 | 0.6363 | 0.4299 | 0.1879 | 0.6248 | 0.2420 | -232.8515 | -253.0488 | -0.6705 | -0.6155 |
0.6304 | 0.4840 | 364 | 0.6321 | 0.7719 | 0.5053 | 0.6435 | 0.2667 | -229.6779 | -249.6284 | -0.6279 | -0.5652 |
0.6126 | 0.5213 | 392 | 0.6325 | 0.5194 | 0.2033 | 0.6375 | 0.3161 | -232.6973 | -252.1539 | -0.6785 | -0.6117 |
0.5974 | 0.5585 | 420 | 0.6254 | 0.7418 | 0.4269 | 0.6428 | 0.3149 | -230.4618 | -249.9303 | -0.6823 | -0.6170 |
0.6185 | 0.5957 | 448 | 0.6267 | 0.9534 | 0.6106 | 0.6409 | 0.3428 | -228.6247 | -247.8141 | -0.6532 | -0.5866 |
0.604 | 0.6330 | 476 | 0.6284 | 0.8011 | 0.4691 | 0.6394 | 0.3320 | -230.0398 | -249.3374 | -0.6842 | -0.6177 |
0.6154 | 0.6702 | 504 | 0.6269 | 0.8353 | 0.5307 | 0.6431 | 0.3046 | -229.4234 | -248.9947 | -0.6705 | -0.6051 |
0.5936 | 0.7074 | 532 | 0.6277 | 0.7287 | 0.4206 | 0.6469 | 0.3082 | -230.5248 | -250.0604 | -0.6887 | -0.6226 |
0.6291 | 0.7447 | 560 | 0.6260 | 0.8539 | 0.5327 | 0.6439 | 0.3211 | -229.4030 | -248.8091 | -0.6758 | -0.6096 |
0.6169 | 0.7819 | 588 | 0.6255 | 0.8797 | 0.5669 | 0.6461 | 0.3127 | -229.0613 | -248.5513 | -0.6690 | -0.6041 |
0.5934 | 0.8191 | 616 | 0.6256 | 0.8582 | 0.5399 | 0.6461 | 0.3183 | -229.3312 | -248.7658 | -0.6753 | -0.6095 |
0.6004 | 0.8564 | 644 | 0.6257 | 0.8263 | 0.5074 | 0.6450 | 0.3189 | -229.6564 | -249.0845 | -0.6761 | -0.6110 |
0.6282 | 0.8936 | 672 | 0.6256 | 0.8133 | 0.4949 | 0.6442 | 0.3184 | -229.7819 | -249.2152 | -0.6748 | -0.6101 |
0.5572 | 0.9309 | 700 | 0.6258 | 0.8122 | 0.4938 | 0.6442 | 0.3184 | -229.7925 | -249.2255 | -0.6781 | -0.6129 |
0.595 | 0.9681 | 728 | 0.6256 | 0.8140 | 0.4943 | 0.6428 | 0.3197 | -229.7873 | -249.2078 | -0.6788 | -0.6134 |
Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 2
Model tree for narekvslife/quantized
Base model
unsloth/llama-3-8b-Instruct-bnb-4bit