qwen2.5-0.5b-expo-IPO-25-2

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-1 on the hZzy/train_pairwise_all_new4 dataset. It achieves the following results on the evaluation set:

Loss: 451.5444
Objective: 444.5488
Reward Accuracy: 0.6001
Logp Accuracy: 0.5990
Log Diff Policy: 65.9131
Chosen Logps: -536.9709
Rejected Logps: -602.8841
Chosen Rewards: -0.4449
Rejected Rewards: -0.5107
Logits: -2.5508

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
499.3267	0.1577	50	499.4303	499.3252	0.5442	0.5213	0.8234	-92.1292	-92.9526	-0.0001	-0.0007	-1.3886
498.3222	0.3154	100	498.5069	498.4890	0.5515	0.5157	1.6597	-96.7072	-98.3669	-0.0046	-0.0062	-1.6546
495.863	0.4731	150	496.6061	496.4090	0.5772	0.5520	3.7397	-122.5748	-126.3145	-0.0305	-0.0341	-1.8836
488.7172	0.6307	200	490.5273	489.1432	0.5721	0.5559	11.0055	-219.0645	-230.0700	-0.1270	-0.1379	-2.2255
468.7791	0.7884	250	472.5557	467.4029	0.5694	0.5716	34.3250	-460.7004	-495.0254	-0.3686	-0.4028	-2.3160
459.9667	0.9461	300	459.5488	452.7194	0.5895	0.5850	54.8692	-507.5175	-562.3867	-0.4155	-0.4702	-2.4008
459.1184	1.1038	350	456.4864	449.5880	0.6012	0.5962	59.7600	-526.2740	-586.0341	-0.4342	-0.4938	-2.4643
446.0909	1.2615	400	454.5478	448.1642	0.6023	0.5990	60.1008	-473.1859	-533.2867	-0.3811	-0.4411	-2.4537
453.4068	1.4192	450	452.3718	444.9123	0.6040	0.6046	66.1148	-551.7129	-617.8278	-0.4597	-0.5256	-2.5416
445.5799	1.5769	500	451.8890	444.8611	0.6029	0.5968	65.3791	-544.8903	-610.2694	-0.4528	-0.5181	-2.5407
438.7807	1.7346	550	451.5486	444.8153	0.6001	0.6001	64.9840	-515.4463	-580.4302	-0.4234	-0.4882	-2.5492
448.3851	1.8922	600	451.5706	444.5457	0.6007	0.5996	65.9475	-538.3950	-604.3425	-0.4463	-0.5121	-2.5510

Framework versions

Transformers 4.42.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-IPO-25-2

qwen2.5-0.5b-expo-IPO-25-2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-IPO-25-2

Dataset used to train hZzy/qwen2.5-0.5b-expo-IPO-25-2

Evaluation results