qwen2.5-0.5b-expo-L2EXPO-W1-25-1
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-1 on the hZzy/train_pairwise_all_new3 dataset. It achieves the following results on the evaluation set:
- Loss: 0.9978
- Objective: 1.0385
- Reward Accuracy: 0.6225
- Logp Accuracy: 0.5257
- Log Diff Policy: 1.5308
- Chosen Logps: -92.4974
- Rejected Logps: -94.0283
- Chosen Rewards: -0.1162
- Rejected Rewards: -0.1695
- Logits: -1.3443
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1.0153 | 0.1579 | 50 | 1.1586 | 1.1718 | 0.5464 | 0.5207 | 1.1966 | -91.8008 | -92.9973 | -0.0466 | -0.0664 | -1.3046 |
1.0057 | 0.3157 | 100 | 1.0674 | 1.0765 | 0.5856 | 0.5263 | 1.5173 | -94.7445 | -96.2618 | -0.3409 | -0.3928 | -1.4112 |
0.8108 | 0.4736 | 150 | 1.0307 | 1.0602 | 0.6051 | 0.5207 | 1.4865 | -92.5835 | -94.0700 | -0.1248 | -0.1736 | -1.4021 |
0.8145 | 0.6314 | 200 | 1.0184 | 1.0653 | 0.6119 | 0.5252 | 1.5125 | -94.9213 | -96.4338 | -0.3586 | -0.4100 | -1.3903 |
0.7189 | 0.7893 | 250 | 1.0102 | 1.0502 | 0.6096 | 0.5246 | 1.5439 | -94.0147 | -95.5586 | -0.2680 | -0.3225 | -1.3744 |
0.7554 | 0.9471 | 300 | 0.9940 | 1.0419 | 0.6152 | 0.5263 | 1.6331 | -94.6551 | -96.2883 | -0.3320 | -0.3955 | -1.4065 |
0.5469 | 1.1050 | 350 | 0.9994 | 1.0501 | 0.6180 | 0.5274 | 1.5541 | -93.9724 | -95.5266 | -0.2637 | -0.3193 | -1.3515 |
0.512 | 1.2628 | 400 | 0.9966 | 1.0408 | 0.6191 | 0.5274 | 1.5421 | -92.4428 | -93.9849 | -0.1108 | -0.1651 | -1.3312 |
0.5342 | 1.4207 | 450 | 1.0068 | 1.0521 | 0.6096 | 0.5285 | 1.5128 | -93.2476 | -94.7605 | -0.1913 | -0.2427 | -1.3427 |
0.4719 | 1.5785 | 500 | 1.0019 | 1.0448 | 0.6152 | 0.5263 | 1.5245 | -92.7874 | -94.3119 | -0.1452 | -0.1978 | -1.3491 |
0.4754 | 1.7364 | 550 | 1.0023 | 1.0418 | 0.6219 | 0.5268 | 1.5303 | -92.6206 | -94.1509 | -0.1285 | -0.1817 | -1.3355 |
0.492 | 1.8942 | 600 | 0.9977 | 1.0330 | 0.6208 | 0.5246 | 1.5432 | -92.4528 | -93.9960 | -0.1118 | -0.1662 | -1.3587 |
0.3514 | 2.0521 | 650 | 0.9945 | 1.0385 | 0.6214 | 0.5257 | 1.5486 | -92.2665 | -93.8151 | -0.0931 | -0.1482 | -1.3449 |
0.3632 | 2.2099 | 700 | 0.9986 | 1.0383 | 0.6208 | 0.5268 | 1.5230 | -91.8966 | -93.4196 | -0.0562 | -0.1086 | -1.3538 |
0.359 | 2.3678 | 750 | 0.9991 | 1.0389 | 0.6169 | 0.5268 | 1.5244 | -92.6741 | -94.1985 | -0.1339 | -0.1865 | -1.3483 |
0.3518 | 2.5257 | 800 | 0.9971 | 1.0368 | 0.6219 | 0.5268 | 1.5332 | -92.5075 | -94.0407 | -0.1172 | -0.1707 | -1.3449 |
0.3661 | 2.6835 | 850 | 0.9980 | 1.0383 | 0.6225 | 0.5263 | 1.5309 | -92.4734 | -94.0043 | -0.1138 | -0.1671 | -1.3435 |
0.335 | 2.8414 | 900 | 0.9976 | 1.0383 | 0.6236 | 0.5257 | 1.5313 | -92.4988 | -94.0302 | -0.1164 | -0.1697 | -1.3442 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.