qwen2.5-0.5b-expo-DPO-noES2-0.1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:
- Loss: 0.8210
- Logps: -136.0435
- Logits: -1.9089
- Objective: 0.8326
- Dpo Loss: 0.8326
- Regularize: 0.8326
- Ranking Simple: 0.5631
- Wo Beta: 10.0450
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 144
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Logps | Logits | Objective | Dpo Loss | Regularize | Ranking Simple | Wo Beta |
---|---|---|---|---|---|---|---|---|---|---|
0.644 | 0.1417 | 50 | 0.6810 | -88.9389 | -1.5667 | 0.6836 | 0.6836 | 0.6836 | 0.5316 | 7.8652 |
0.5968 | 0.2834 | 100 | 0.6815 | -101.2053 | -1.7411 | 0.6847 | 0.6847 | 0.6847 | 0.5393 | 7.7021 |
0.5144 | 0.4251 | 150 | 0.6850 | -94.4351 | -1.7346 | 0.6782 | 0.6782 | 0.6782 | 0.5549 | 7.3585 |
0.4808 | 0.5668 | 200 | 0.7037 | -103.6052 | -1.8068 | 0.7036 | 0.7036 | 0.7036 | 0.5569 | 7.7326 |
0.4863 | 0.7085 | 250 | 0.7026 | -91.8159 | -1.9767 | 0.6984 | 0.6984 | 0.6984 | 0.5476 | 7.8312 |
0.4389 | 0.8503 | 300 | 0.6993 | -105.6110 | -2.0810 | 0.6947 | 0.6947 | 0.6947 | 0.5600 | 7.5894 |
0.3851 | 0.9920 | 350 | 0.7227 | -103.2476 | -2.0184 | 0.7155 | 0.7155 | 0.7155 | 0.5492 | 7.9656 |
0.2556 | 1.1337 | 400 | 0.7344 | -109.3563 | -1.9806 | 0.7314 | 0.7314 | 0.7314 | 0.5445 | 8.6228 |
0.264 | 1.2754 | 450 | 0.7229 | -110.4481 | -1.8432 | 0.7204 | 0.7204 | 0.7204 | 0.5580 | 8.5473 |
0.2767 | 1.4171 | 500 | 0.7313 | -111.4522 | -1.9699 | 0.7300 | 0.7300 | 0.7300 | 0.5497 | 8.5441 |
0.2273 | 1.5588 | 550 | 0.7207 | -116.6543 | -1.7731 | 0.7313 | 0.7313 | 0.7313 | 0.5575 | 8.5606 |
0.2232 | 1.7005 | 600 | 0.7356 | -115.8618 | -1.7360 | 0.7399 | 0.7399 | 0.7399 | 0.5719 | 8.7758 |
0.2623 | 1.8422 | 650 | 0.7370 | -117.7434 | -2.0182 | 0.7381 | 0.7381 | 0.7381 | 0.5745 | 8.7274 |
0.2194 | 1.9839 | 700 | 0.7433 | -121.1650 | -1.9499 | 0.7528 | 0.7528 | 0.7528 | 0.5657 | 9.0270 |
0.1094 | 2.1256 | 750 | 0.8255 | -134.3582 | -1.8660 | 0.8363 | 0.8363 | 0.8363 | 0.5611 | 10.1139 |
0.1222 | 2.2674 | 800 | 0.8124 | -133.2139 | -1.9092 | 0.8237 | 0.8237 | 0.8237 | 0.5652 | 9.8993 |
0.1161 | 2.4091 | 850 | 0.8204 | -134.1696 | -1.8946 | 0.8314 | 0.8314 | 0.8314 | 0.5642 | 9.9670 |
0.1268 | 2.5508 | 900 | 0.8157 | -135.2029 | -1.8941 | 0.8271 | 0.8271 | 0.8271 | 0.5642 | 9.9596 |
0.1263 | 2.6925 | 950 | 0.8189 | -135.8437 | -1.9048 | 0.8305 | 0.8305 | 0.8305 | 0.5642 | 10.0013 |
0.1197 | 2.8342 | 1000 | 0.8205 | -135.9884 | -1.9072 | 0.8320 | 0.8320 | 0.8320 | 0.5626 | 10.0373 |
0.1192 | 2.9759 | 1050 | 0.8210 | -136.0435 | -1.9089 | 0.8326 | 0.8326 | 0.8326 | 0.5631 | 10.0450 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 13
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for hZzy/qwen2.5-0.5b-expo-DPO-noES2-0.1
Base model
hZzy/qwen2.5-0.5b-sft-news-IFT