qwen2.5-0.5b-expo-DPO-EXPERIMENT-10-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

Loss: 15.2566
Logps: -80.3981
Logits: -1.0046
Objective: 15.1445
Dpo Loss: 15.1445
Regularize: 15.1445
Ranking Simple: 0.5134
Ranking Idealized: 0.5093
Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 12
total_train_batch_size: 288
total_eval_batch_size: 24
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Logps	Logits	Objective	Dpo Loss	Regularize	Ranking Simple	Ranking Idealized	Ranking Idealized Expo
9.5723	0.2834	50	9.2586	-89.6862	-1.4979	9.6501	9.6501	9.6501	0.5134	0.5093	0.5093
9.8364	0.5668	100	15.5453	-79.4201	-1.3475	15.5409	15.5409	15.5409	0.5176	0.5093	0.5093
8.8451	0.8503	150	16.6626	-82.1459	-1.1122	16.5626	16.5626	16.5626	0.5145	0.5093	0.5093
3.8083	1.1337	200	16.0519	-81.6751	-1.0874	16.3240	16.3240	16.3240	0.5186	0.5093	0.5093
3.6019	1.4171	250	15.8144	-81.5609	-0.9933	15.7679	15.7679	15.7679	0.5176	0.5093	0.5093
2.1682	1.7005	300	15.3824	-80.3329	-1.0036	15.2004	15.2004	15.2004	0.5114	0.5093	0.5093
2.703	1.9839	350	15.2566	-80.3981	-1.0046	15.1445	15.1445	15.1445	0.5134	0.5093	0.5093

Framework versions

Transformers 4.42.0
Pytorch 2.3.0+cu121
Datasets 2.19.1
Tokenizers 0.19.1

hZzy
/

qwen2.5-0.5b-expo-DPO-EXPERIMENT-10-5e6

qwen2.5-0.5b-expo-DPO-EXPERIMENT-10-5e6

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-EXPERIMENT-10-5e6

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-EXPERIMENT-10-5e6

Evaluation results