qwen2.5-0.5b-expo-L2EXPO-W1-25-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft3-25-1 on the hZzy/train_pairwise_all_new3 dataset. It achieves the following results on the evaluation set:

Loss: 0.9978
Objective: 1.0385
Reward Accuracy: 0.6225
Logp Accuracy: 0.5257
Log Diff Policy: 1.5308
Chosen Logps: -92.4974
Rejected Logps: -94.0283
Chosen Rewards: -0.1162
Rejected Rewards: -0.1695
Logits: -1.3443

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Objective	Reward Accuracy	Logp Accuracy	Log Diff Policy	Chosen Logps	Rejected Logps	Chosen Rewards	Rejected Rewards	Logits
1.0153	0.1579	50	1.1586	1.1718	0.5464	0.5207	1.1966	-91.8008	-92.9973	-0.0466	-0.0664	-1.3046
1.0057	0.3157	100	1.0674	1.0765	0.5856	0.5263	1.5173	-94.7445	-96.2618	-0.3409	-0.3928	-1.4112
0.8108	0.4736	150	1.0307	1.0602	0.6051	0.5207	1.4865	-92.5835	-94.0700	-0.1248	-0.1736	-1.4021
0.8145	0.6314	200	1.0184	1.0653	0.6119	0.5252	1.5125	-94.9213	-96.4338	-0.3586	-0.4100	-1.3903
0.7189	0.7893	250	1.0102	1.0502	0.6096	0.5246	1.5439	-94.0147	-95.5586	-0.2680	-0.3225	-1.3744
0.7554	0.9471	300	0.9940	1.0419	0.6152	0.5263	1.6331	-94.6551	-96.2883	-0.3320	-0.3955	-1.4065
0.5469	1.1050	350	0.9994	1.0501	0.6180	0.5274	1.5541	-93.9724	-95.5266	-0.2637	-0.3193	-1.3515
0.512	1.2628	400	0.9966	1.0408	0.6191	0.5274	1.5421	-92.4428	-93.9849	-0.1108	-0.1651	-1.3312
0.5342	1.4207	450	1.0068	1.0521	0.6096	0.5285	1.5128	-93.2476	-94.7605	-0.1913	-0.2427	-1.3427
0.4719	1.5785	500	1.0019	1.0448	0.6152	0.5263	1.5245	-92.7874	-94.3119	-0.1452	-0.1978	-1.3491
0.4754	1.7364	550	1.0023	1.0418	0.6219	0.5268	1.5303	-92.6206	-94.1509	-0.1285	-0.1817	-1.3355
0.492	1.8942	600	0.9977	1.0330	0.6208	0.5246	1.5432	-92.4528	-93.9960	-0.1118	-0.1662	-1.3587
0.3514	2.0521	650	0.9945	1.0385	0.6214	0.5257	1.5486	-92.2665	-93.8151	-0.0931	-0.1482	-1.3449
0.3632	2.2099	700	0.9986	1.0383	0.6208	0.5268	1.5230	-91.8966	-93.4196	-0.0562	-0.1086	-1.3538
0.359	2.3678	750	0.9991	1.0389	0.6169	0.5268	1.5244	-92.6741	-94.1985	-0.1339	-0.1865	-1.3483
0.3518	2.5257	800	0.9971	1.0368	0.6219	0.5268	1.5332	-92.5075	-94.0407	-0.1172	-0.1707	-1.3449
0.3661	2.6835	850	0.9980	1.0383	0.6225	0.5263	1.5309	-92.4734	-94.0043	-0.1138	-0.1671	-1.3435
0.335	2.8414	900	0.9976	1.0383	0.6236	0.5257	1.5313	-92.4988	-94.0302	-0.1164	-0.1697	-1.3442

Framework versions

Transformers 4.42.0
Pytorch 2.6.0+cu124
Datasets 3.2.0
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-L2EXPO-W1-25-1

qwen2.5-0.5b-expo-L2EXPO-W1-25-1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-L2EXPO-W1-25-1

Dataset used to train hZzy/qwen2.5-0.5b-expo-L2EXPO-W1-25-1

Evaluation results