qwen2.5-0.5b-expo-IPO-25-2
This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-1 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:
- Loss: 451.5444
- Objective: 444.5488
- Reward Accuracy: 0.6001
- Logp Accuracy: 0.5990
- Log Diff Policy: 65.9131
- Chosen Logps: -536.9709
- Rejected Logps: -602.8841
- Chosen Rewards: -0.4449
- Rejected Rewards: -0.5107
- Logits: -2.5508
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 12
- total_train_batch_size: 288
- total_eval_batch_size: 24
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Objective | Reward Accuracy | Logp Accuracy | Log Diff Policy | Chosen Logps | Rejected Logps | Chosen Rewards | Rejected Rewards | Logits |
---|---|---|---|---|---|---|---|---|---|---|---|---|
499.3267 | 0.1577 | 50 | 499.4303 | 499.3252 | 0.5442 | 0.5213 | 0.8234 | -92.1292 | -92.9526 | -0.0001 | -0.0007 | -1.3886 |
498.3222 | 0.3154 | 100 | 498.5069 | 498.4890 | 0.5515 | 0.5157 | 1.6597 | -96.7072 | -98.3669 | -0.0046 | -0.0062 | -1.6546 |
495.863 | 0.4731 | 150 | 496.6061 | 496.4090 | 0.5772 | 0.5520 | 3.7397 | -122.5748 | -126.3145 | -0.0305 | -0.0341 | -1.8836 |
488.7172 | 0.6307 | 200 | 490.5273 | 489.1432 | 0.5721 | 0.5559 | 11.0055 | -219.0645 | -230.0700 | -0.1270 | -0.1379 | -2.2255 |
468.7791 | 0.7884 | 250 | 472.5557 | 467.4029 | 0.5694 | 0.5716 | 34.3250 | -460.7004 | -495.0254 | -0.3686 | -0.4028 | -2.3160 |
459.9667 | 0.9461 | 300 | 459.5488 | 452.7194 | 0.5895 | 0.5850 | 54.8692 | -507.5175 | -562.3867 | -0.4155 | -0.4702 | -2.4008 |
459.1184 | 1.1038 | 350 | 456.4864 | 449.5880 | 0.6012 | 0.5962 | 59.7600 | -526.2740 | -586.0341 | -0.4342 | -0.4938 | -2.4643 |
446.0909 | 1.2615 | 400 | 454.5478 | 448.1642 | 0.6023 | 0.5990 | 60.1008 | -473.1859 | -533.2867 | -0.3811 | -0.4411 | -2.4537 |
453.4068 | 1.4192 | 450 | 452.3718 | 444.9123 | 0.6040 | 0.6046 | 66.1148 | -551.7129 | -617.8278 | -0.4597 | -0.5256 | -2.5416 |
445.5799 | 1.5769 | 500 | 451.8890 | 444.8611 | 0.6029 | 0.5968 | 65.3791 | -544.8903 | -610.2694 | -0.4528 | -0.5181 | -2.5407 |
438.7807 | 1.7346 | 550 | 451.5486 | 444.8153 | 0.6001 | 0.6001 | 64.9840 | -515.4463 | -580.4302 | -0.4234 | -0.4882 | -2.5492 |
448.3851 | 1.8922 | 600 | 451.5706 | 444.5457 | 0.6007 | 0.5996 | 65.9475 | -538.3950 | -604.3425 | -0.4463 | -0.5121 | -2.5510 |
Framework versions
- Transformers 4.42.0
- Pytorch 2.6.0+cu124
- Datasets 3.2.0
- Tokenizers 0.19.1
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.