Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-noES4-1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0126
  • Logps: -79.3398
  • Logits: -0.6948
  • Objective: 1.9384
  • Dpo Loss: 1.9384
  • Regularize: 1.9384
  • Ranking Simple: 0.5430
  • Wo Beta: 6.8782

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Wo Beta
0.9504 0.1417 50 0.9836 -92.0961 -1.4075 0.9764 0.9764 0.9764 0.5259 7.7454
1.2453 0.2834 100 1.2755 -89.9832 -1.3843 1.2122 1.2122 1.2122 0.5336 7.3870
1.4306 0.4251 150 1.7989 -76.2439 -1.2411 1.7433 1.7433 1.7433 0.5357 7.2217
1.3076 0.5668 200 1.9202 -74.0915 -1.1859 1.8276 1.8276 1.8276 0.5388 7.0633
1.3919 0.7085 250 2.0724 -78.0833 -1.1719 2.0197 2.0197 2.0197 0.5331 7.1561
1.1295 0.8503 300 2.0572 -82.8187 -0.9338 1.9944 1.9944 1.9944 0.5404 6.9967
0.9469 0.9920 350 2.1175 -81.5331 -0.9147 1.9927 1.9927 1.9927 0.5383 6.8160
0.6125 1.1337 400 2.2348 -81.9750 -0.7963 2.1620 2.1620 2.1620 0.5342 6.9279
0.5932 1.2754 450 2.0919 -81.2039 -0.8203 2.0152 2.0152 2.0152 0.5378 6.8450
0.6283 1.4171 500 2.1655 -82.7384 -0.6531 2.0804 2.0804 2.0804 0.5393 6.7975
0.5148 1.5588 550 2.0636 -83.5332 -0.7097 1.9877 1.9877 1.9877 0.5352 6.7990
0.3647 1.7005 600 2.0351 -80.1348 -0.6773 1.9649 1.9649 1.9649 0.5373 6.8458
0.5325 1.8422 650 2.0596 -80.1766 -0.6632 1.9878 1.9878 1.9878 0.5419 6.8518
0.3442 1.9839 700 2.0875 -80.5005 -0.6144 1.9977 1.9977 1.9977 0.5435 6.7884
0.119 2.1256 750 2.0463 -80.5714 -0.6875 1.9523 1.9523 1.9523 0.5445 6.8488
0.1373 2.2674 800 2.0092 -80.3873 -0.6842 1.9315 1.9315 1.9315 0.5461 6.8647
0.131 2.4091 850 2.0060 -80.0760 -0.6729 1.9353 1.9353 1.9353 0.5450 6.8928
0.1325 2.5508 900 2.0058 -79.8102 -0.6821 1.9324 1.9324 1.9324 0.5450 6.8844
0.138 2.6925 950 2.0122 -79.3969 -0.6917 1.9363 1.9363 1.9363 0.5435 6.8712
0.1283 2.8342 1000 2.0122 -79.3139 -0.6959 1.9376 1.9376 1.9376 0.5430 6.8769
0.0938 2.9759 1050 2.0126 -79.3398 -0.6948 1.9384 1.9384 1.9384 0.5430 6.8782

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
12
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-noES4-1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-noES4-1