hZzy's picture
End of training
f06764c verified
|
raw
history blame
6.09 kB
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-L2EXPO-ES-100
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-ES-100

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 486.1626
  • Logps: -82.8268
  • Logits: -0.5435
  • Objective: 489.7928
  • Dpo Loss: 245.8756
  • Regularize: 489.7928
  • Ranking Simple: 0.5254
  • Ranking Idealized: 0.5212
  • Ranking Idealized Expo: 0.5212
  • Wo Beta: 14.0464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Dpo Loss Logits Logps Validation Loss Objective Ranking Idealized Ranking Idealized Expo Ranking Simple Regularize Wo Beta
43.2587 0.1417 50 26.4475 -1.4448 -90.5292 52.6622 53.6977 0.5212 0.5212 0.5264 53.6977 16.1700
169.8852 0.2834 100 85.7639 -1.3621 -85.2787 173.9861 172.1891 0.5212 0.5212 0.5243 172.1891 15.4391
285.0432 0.4251 150 143.0300 -1.1694 -83.2181 291.4834 293.4404 0.5212 0.5212 0.5280 293.4404 15.2225
355.4066 0.5668 200 189.8469 -0.9274 -84.0320 372.7906 365.2124 0.5212 0.5212 0.5233 365.2124 14.8684
368.9811 0.7085 250 216.4584 -0.7746 -81.5050 446.6966 442.3321 0.5212 0.5212 0.5259 442.3321 14.4790
360.5868 0.8503 300 222.8840 -0.5984 -82.2011 448.9506 443.9051 0.5212 0.5212 0.5248 443.9051 14.3930
338.3987 0.9920 350 232.9365 -0.7855 -84.1638 462.1923 461.2073 0.5212 0.5212 0.5269 461.2073 14.2979
309.1712 1.1337 400 248.0718 -0.6414 -82.4934 480.5965 478.7404 0.5212 0.5212 0.5254 478.7404 14.3872
298.1424 1.2754 450 247.8722 -0.7014 -82.1465 480.3256 482.1766 0.5212 0.5212 0.5238 482.1766 14.3695
282.4504 1.4171 500 252.2093 -0.4578 -83.4101 493.7484 495.7639 0.5212 0.5212 0.5248 495.7639 14.1743
261.1027 1.5588 550 245.8756 -0.5435 -82.8268 486.1626 489.7928 0.5212 0.5212 0.5254 489.7928 14.0464
255.9288 1.7005 600 251.2934 -0.5347 -82.1768 500.3801 502.1727 0.5212 0.5212 0.5269 502.1727 14.2436
248.6787 1.8422 650 254.5959 -0.5140 -81.4923 502.3153 504.1582 0.5212 0.5212 0.5248 504.1582 14.3320
226.4676 1.9839 700 264.1660 -0.4816 -83.4216 512.6990 516.7103 0.5212 0.5212 0.5254 516.7103 14.0834
207.1551 2.1256 750 259.2528 -0.5410 -83.4589 506.4237 510.6129 0.5212 0.5212 0.5238 510.6129 14.1295
197.3545 2.2674 800 262.3102 -0.5659 -84.8747 513.3979 514.3120 0.5212 0.5212 0.5228 514.3120 14.0704
182.3796 2.4138 850 501.8831 -82.8624 -0.5510 504.8523 254.1251 504.8523 0.5274 0.5212 0.5212 14.1707
176.042 2.5555 900 518.1983 -85.0710 -0.5039 519.5008 263.2800 519.5008 0.5238 0.5212 0.5212 14.1123
164.8281 2.6972 950 512.1844 -84.5843 -0.5200 512.7651 262.8074 512.7651 0.5238 0.5212 0.5212 14.1643
150.0401 2.8389 1000 514.7036 -83.7343 -0.5219 516.5959 263.6169 516.5959 0.5259 0.5212 0.5212 14.1800
141.0317 2.9806 1050 519.2467 -84.2676 -0.4953 521.8153 266.9453 521.8153 0.5264 0.5212 0.5212 14.2577

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1