qwen2.5-0.5b-expo-DPO-L2EXPO-W2-noES-0.1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:
- Loss: 0.1180
- Logps: -80.3490
- Logits: -0.5109
- Objective: 0.1149
- Dpo Loss: 0.7081
- Regularize: 0.6066
- Ranking Simple: 0.5362
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 12
- total_train_batch_size: 144
- total_eval_batch_size: 12
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Logps | Logits | Objective | Dpo Loss | Regularize | Ranking Simple |
---|---|---|---|---|---|---|---|---|---|
0.0974 | 0.1417 | 50 | 0.1013 | -88.6215 | -1.4569 | 0.1014 | 0.6831 | 0.4279 | 0.5285 |
0.0968 | 0.2834 | 100 | 0.1060 | -85.6897 | -1.5103 | 0.1055 | 0.6828 | 0.4821 | 0.5300 |
0.0947 | 0.4251 | 150 | 0.1114 | -83.2094 | -1.5092 | 0.1107 | 0.6962 | 0.5376 | 0.5326 |
0.0964 | 0.5668 | 200 | 0.1115 | -77.7950 | -1.1358 | 0.1114 | 0.6928 | 0.5439 | 0.5316 |
0.0994 | 0.7085 | 250 | 0.1146 | -78.1513 | -1.0086 | 0.1126 | 0.7003 | 0.5531 | 0.5373 |
0.1059 | 0.8503 | 300 | 0.1161 | -81.8632 | -0.8100 | 0.1133 | 0.7048 | 0.5859 | 0.5450 |
0.0992 | 0.9920 | 350 | 0.1190 | -79.1597 | -0.8348 | 0.1167 | 0.7056 | 0.6018 | 0.5461 |
0.0992 | 1.1337 | 400 | 0.1205 | -80.4108 | -0.6239 | 0.1171 | 0.7107 | 0.6249 | 0.5409 |
0.0817 | 1.2754 | 450 | 0.1204 | -80.6768 | -0.6133 | 0.1169 | 0.7151 | 0.6378 | 0.5378 |
0.097 | 1.4171 | 500 | 0.1191 | -79.8564 | -0.6515 | 0.1164 | 0.7078 | 0.6180 | 0.5383 |
0.0865 | 1.5588 | 550 | 0.1196 | -81.0992 | -0.5267 | 0.1169 | 0.7048 | 0.6207 | 0.5367 |
0.0763 | 1.7005 | 600 | 0.1193 | -81.0751 | -0.4880 | 0.1161 | 0.7045 | 0.6161 | 0.5399 |
0.0772 | 1.8422 | 650 | 0.1186 | -80.2917 | -0.4325 | 0.1154 | 0.7078 | 0.6067 | 0.5367 |
0.0706 | 1.9839 | 700 | 0.1187 | -79.8271 | -0.5150 | 0.1151 | 0.7099 | 0.6083 | 0.5383 |
0.0684 | 2.1256 | 750 | 0.1184 | -80.5600 | -0.4615 | 0.1149 | 0.7091 | 0.6088 | 0.5383 |
0.0655 | 2.2674 | 800 | 0.1189 | -80.4247 | -0.4822 | 0.1158 | 0.7088 | 0.6121 | 0.5383 |
0.0722 | 2.4091 | 850 | 0.1181 | -80.5667 | -0.4638 | 0.1149 | 0.7078 | 0.6067 | 0.5388 |
0.0787 | 2.5508 | 900 | 0.1179 | -80.4408 | -0.5063 | 0.1148 | 0.7076 | 0.6056 | 0.5373 |
0.0657 | 2.6925 | 950 | 0.1180 | -80.4005 | -0.5067 | 0.1149 | 0.7081 | 0.6065 | 0.5373 |
0.0583 | 2.8342 | 1000 | 0.1180 | -80.3490 | -0.5108 | 0.1149 | 0.7081 | 0.6066 | 0.5367 |
0.0679 | 2.9759 | 1050 | 0.1180 | -80.3490 | -0.5109 | 0.1149 | 0.7081 | 0.6066 | 0.5362 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.3.0+cu121
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W2-noES-0.1
Base model
hZzy/qwen2.5-0.5b-sft-news-IFT