Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-L2EXPO-W2-noES-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1180
  • Logps: -80.3490
  • Logits: -0.5109
  • Objective: 0.1149
  • Dpo Loss: 0.7081
  • Regularize: 0.6066
  • Ranking Simple: 0.5362

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple
0.0974 0.1417 50 0.1013 -88.6215 -1.4569 0.1014 0.6831 0.4279 0.5285
0.0968 0.2834 100 0.1060 -85.6897 -1.5103 0.1055 0.6828 0.4821 0.5300
0.0947 0.4251 150 0.1114 -83.2094 -1.5092 0.1107 0.6962 0.5376 0.5326
0.0964 0.5668 200 0.1115 -77.7950 -1.1358 0.1114 0.6928 0.5439 0.5316
0.0994 0.7085 250 0.1146 -78.1513 -1.0086 0.1126 0.7003 0.5531 0.5373
0.1059 0.8503 300 0.1161 -81.8632 -0.8100 0.1133 0.7048 0.5859 0.5450
0.0992 0.9920 350 0.1190 -79.1597 -0.8348 0.1167 0.7056 0.6018 0.5461
0.0992 1.1337 400 0.1205 -80.4108 -0.6239 0.1171 0.7107 0.6249 0.5409
0.0817 1.2754 450 0.1204 -80.6768 -0.6133 0.1169 0.7151 0.6378 0.5378
0.097 1.4171 500 0.1191 -79.8564 -0.6515 0.1164 0.7078 0.6180 0.5383
0.0865 1.5588 550 0.1196 -81.0992 -0.5267 0.1169 0.7048 0.6207 0.5367
0.0763 1.7005 600 0.1193 -81.0751 -0.4880 0.1161 0.7045 0.6161 0.5399
0.0772 1.8422 650 0.1186 -80.2917 -0.4325 0.1154 0.7078 0.6067 0.5367
0.0706 1.9839 700 0.1187 -79.8271 -0.5150 0.1151 0.7099 0.6083 0.5383
0.0684 2.1256 750 0.1184 -80.5600 -0.4615 0.1149 0.7091 0.6088 0.5383
0.0655 2.2674 800 0.1189 -80.4247 -0.4822 0.1158 0.7088 0.6121 0.5383
0.0722 2.4091 850 0.1181 -80.5667 -0.4638 0.1149 0.7078 0.6067 0.5388
0.0787 2.5508 900 0.1179 -80.4408 -0.5063 0.1148 0.7076 0.6056 0.5373
0.0657 2.6925 950 0.1180 -80.4005 -0.5067 0.1149 0.7081 0.6065 0.5373
0.0583 2.8342 1000 0.1180 -80.3490 -0.5108 0.1149 0.7081 0.6066 0.5367
0.0679 2.9759 1050 0.1180 -80.3490 -0.5109 0.1149 0.7081 0.6066 0.5362

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W2-noES-0.1

Finetuned
(69)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-W2-noES-0.1