Edit model card

Visualize in Weights & Biases

qwen2.5-0.5b-expo-EXDPO-WEIGHT-BETA0.2

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3417
  • Logps: -93.8085
  • Logits: -1.1885
  • Objective: 0.3403
  • Dpo Loss: 0.7083
  • Regularize: 0.2695
  • Ranking Simple: 0.5186
  • Ranking Idealized: 0.5399
  • Ranking Idealized Expo: 0.5243

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 96
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo
0.0798 0.0945 50 0.0807 -98.5040 -1.3072 0.0808 0.6932 0.0115 0.5238 0.5399 0.5243
0.081 0.1889 100 0.0819 -98.4932 -1.3084 0.0819 0.6934 0.0125 0.5238 0.5399 0.5243
0.0839 0.2834 150 0.0823 -98.5417 -1.3079 0.0823 0.6931 0.0130 0.5233 0.5399 0.5243
0.0891 0.3779 200 0.0840 -98.6517 -1.3063 0.0839 0.6927 0.0147 0.5238 0.5399 0.5243
0.1019 0.4724 250 0.0865 -98.6753 -1.3058 0.0864 0.6929 0.0171 0.5233 0.5399 0.5243
0.1094 0.5668 300 0.0928 -98.3448 -1.3087 0.0930 0.6926 0.0238 0.5238 0.5399 0.5243
0.1267 0.6613 350 0.0995 -98.4803 -1.3097 0.1004 0.6942 0.0310 0.5243 0.5399 0.5243
0.1414 0.7558 400 0.1020 -98.6999 -1.3138 0.1027 0.6920 0.0335 0.5248 0.5399 0.5243
0.156 0.8503 450 0.1102 -98.6961 -1.3001 0.1107 0.6917 0.0415 0.5228 0.5399 0.5243
0.1843 0.9447 500 0.1425 -98.3685 -1.2985 0.1416 0.6934 0.0723 0.5217 0.5399 0.5243
0.1954 1.0392 550 0.1388 -98.2336 -1.3154 0.1383 0.6946 0.0689 0.5228 0.5399 0.5243
0.2073 1.1337 600 0.1374 -98.7215 -1.3071 0.1369 0.6936 0.0676 0.5228 0.5399 0.5243
0.2165 1.2282 650 0.1478 -97.9261 -1.2926 0.1487 0.6916 0.0796 0.5233 0.5399 0.5243
0.2333 1.3226 700 0.1470 -97.1071 -1.2930 0.1450 0.6915 0.0758 0.5243 0.5399 0.5243
0.229 1.4171 750 0.1718 -97.0923 -1.2689 0.1725 0.6929 0.1032 0.5238 0.5399 0.5243
0.2565 1.5116 800 0.1817 -97.2621 -1.2540 0.1830 0.6944 0.1136 0.5243 0.5399 0.5243
0.2479 1.6060 850 0.1864 -96.3423 -1.2708 0.1853 0.6946 0.1159 0.5243 0.5399 0.5243
0.2586 1.7005 900 0.1839 -97.2157 -1.2623 0.1825 0.6944 0.1131 0.5223 0.5399 0.5243
0.2347 1.7950 950 0.1995 -94.8402 -1.2678 0.1989 0.6945 0.1295 0.5238 0.5399 0.5243
0.2414 1.8895 1000 0.1895 -95.8793 -1.2579 0.1901 0.6924 0.1209 0.5254 0.5399 0.5243
0.2433 1.9839 1050 0.2097 -95.7970 -1.2552 0.2068 0.6923 0.1376 0.5259 0.5399 0.5243
0.2393 2.0784 1100 0.2156 -96.9313 -1.2422 0.2149 0.6962 0.1452 0.5264 0.5399 0.5243
0.2476 2.1729 1150 0.2195 -95.8618 -1.2485 0.2191 0.6958 0.1495 0.5238 0.5399 0.5243
0.2443 2.2674 1200 0.2318 -97.1362 -1.2241 0.2317 0.6998 0.1617 0.5259 0.5399 0.5243
0.2337 2.3618 1250 0.2494 -96.2629 -1.2313 0.2515 0.6950 0.1820 0.5269 0.5399 0.5243
0.2264 2.4563 1300 0.2473 -94.4504 -1.2535 0.2456 0.6981 0.1758 0.5223 0.5399 0.5243
0.2398 2.5508 1350 0.2467 -96.2065 -1.2349 0.2462 0.7027 0.1760 0.5197 0.5399 0.5243
0.2346 2.6453 1400 0.2565 -94.6591 -1.2562 0.2567 0.7002 0.1867 0.5212 0.5399 0.5243
0.242 2.7397 1450 0.2640 -94.6555 -1.2141 0.2641 0.7015 0.1939 0.5243 0.5399 0.5243
0.2372 2.8342 1500 0.2747 -94.9289 -1.2472 0.2726 0.7027 0.2024 0.5202 0.5399 0.5243
0.2133 2.9287 1550 0.2529 -95.1991 -1.2345 0.2512 0.7006 0.1811 0.5243 0.5399 0.5243
0.2292 3.0231 1600 0.2840 -93.6334 -1.2437 0.2861 0.7038 0.2157 0.5197 0.5399 0.5243
0.2227 3.1176 1650 0.2854 -93.4763 -1.2332 0.2851 0.7025 0.2149 0.5217 0.5399 0.5243
0.2123 3.2121 1700 0.2752 -95.6906 -1.2311 0.2756 0.7008 0.2055 0.5233 0.5399 0.5243
0.218 3.3066 1750 0.2800 -95.9042 -1.2167 0.2783 0.7037 0.2079 0.5238 0.5399 0.5243
0.2086 3.4010 1800 0.2945 -95.6983 -1.2183 0.2932 0.7027 0.2230 0.5233 0.5399 0.5243
0.216 3.4955 1850 0.2895 -93.0784 -1.2235 0.2873 0.7028 0.2171 0.5212 0.5399 0.5243
0.2182 3.5900 1900 0.2973 -95.2384 -1.2138 0.2977 0.7019 0.2275 0.5207 0.5399 0.5243
0.2097 3.6845 1950 0.3023 -93.4940 -1.2111 0.3000 0.7046 0.2295 0.5217 0.5399 0.5243
0.2076 3.7789 2000 0.3084 -93.0939 -1.2337 0.3067 0.7034 0.2364 0.5243 0.5399 0.5243
0.2099 3.8734 2050 0.2962 -93.1727 -1.2280 0.2954 0.7044 0.2249 0.5212 0.5399 0.5243
0.2001 3.9679 2100 0.3139 -93.9210 -1.2079 0.3123 0.7063 0.2417 0.5186 0.5399 0.5243
0.2082 4.0624 2150 0.3119 -93.6768 -1.2148 0.3124 0.7037 0.2420 0.5217 0.5399 0.5243
0.1914 4.1568 2200 0.3139 -94.5737 -1.2179 0.3138 0.7032 0.2434 0.5197 0.5399 0.5243
0.2026 4.2513 2250 0.3179 -93.2220 -1.2044 0.3177 0.7035 0.2473 0.5202 0.5399 0.5243
0.1908 4.3458 2300 0.3067 -94.3151 -1.2117 0.3085 0.7022 0.2383 0.5233 0.5399 0.5243
0.1931 4.4402 2350 0.3241 -93.4124 -1.2066 0.3236 0.7058 0.2530 0.5223 0.5399 0.5243
0.195 4.5347 2400 0.3111 -94.2419 -1.2062 0.3113 0.7035 0.2410 0.5217 0.5399 0.5243
0.1947 4.6292 2450 0.3312 -93.6715 -1.1956 0.3317 0.7067 0.2610 0.5228 0.5399 0.5243
0.1837 4.7237 2500 0.3289 -93.6179 -1.2041 0.3304 0.7077 0.2596 0.5223 0.5399 0.5243
0.1751 4.8181 2550 0.3254 -93.4709 -1.1993 0.3247 0.7060 0.2541 0.5212 0.5399 0.5243
0.1717 4.9126 2600 0.3287 -94.2886 -1.2078 0.3292 0.7050 0.2587 0.5207 0.5399 0.5243
0.1761 5.0071 2650 0.3257 -93.6210 -1.2055 0.3239 0.7061 0.2533 0.5217 0.5399 0.5243
0.1692 5.1016 2700 0.3396 -93.0109 -1.2063 0.3378 0.7072 0.2670 0.5223 0.5399 0.5243
0.1676 5.1960 2750 0.3402 -93.9591 -1.1978 0.3384 0.7084 0.2675 0.5202 0.5399 0.5243
0.1743 5.2905 2800 0.3371 -93.9100 -1.1972 0.3351 0.7076 0.2643 0.5217 0.5399 0.5243
0.1715 5.3850 2850 0.3408 -93.6808 -1.1939 0.3405 0.7084 0.2696 0.5212 0.5399 0.5243
0.1643 5.4795 2900 0.3434 -93.0381 -1.1941 0.3434 0.7095 0.2724 0.5192 0.5399 0.5243
0.1569 5.5739 2950 0.3403 -94.4489 -1.1993 0.3406 0.7083 0.2698 0.5192 0.5399 0.5243
0.16 5.6684 3000 0.3337 -94.1339 -1.1952 0.3332 0.7068 0.2625 0.5233 0.5399 0.5243
0.1556 5.7629 3050 0.3379 -93.7011 -1.1943 0.3366 0.7075 0.2658 0.5197 0.5399 0.5243
0.1544 5.8573 3100 0.3407 -93.8059 -1.1896 0.3385 0.7082 0.2677 0.5212 0.5399 0.5243
0.1539 5.9518 3150 0.3377 -93.3647 -1.2013 0.3358 0.7079 0.2650 0.5207 0.5399 0.5243
0.1448 6.0463 3200 0.3418 -93.0674 -1.1912 0.3402 0.7086 0.2693 0.5181 0.5399 0.5243
0.1479 6.1408 3250 0.3437 -93.1651 -1.1883 0.3423 0.7079 0.2715 0.5217 0.5399 0.5243
0.1408 6.2352 3300 0.3427 -93.4029 -1.1821 0.3405 0.7074 0.2698 0.5197 0.5399 0.5243
0.1475 6.3297 3350 0.3401 -93.6032 -1.1856 0.3383 0.7078 0.2675 0.5192 0.5399 0.5243
0.1339 6.4242 3400 0.3415 -93.5229 -1.1891 0.3402 0.7082 0.2693 0.5212 0.5399 0.5243
0.1394 6.5187 3450 0.3398 -94.0518 -1.1959 0.3379 0.7083 0.2671 0.5186 0.5399 0.5243
0.1324 6.6131 3500 0.3401 -93.9466 -1.1836 0.3389 0.7075 0.2682 0.5192 0.5399 0.5243
0.1385 6.7076 3550 0.3449 -93.6245 -1.1866 0.3437 0.7080 0.2729 0.5202 0.5399 0.5243
0.1289 6.8021 3600 0.3433 -93.8482 -1.1858 0.3412 0.7088 0.2703 0.5192 0.5399 0.5243
0.1272 6.8966 3650 0.3431 -93.9371 -1.1979 0.3417 0.7080 0.2709 0.5202 0.5399 0.5243
0.125 6.9910 3700 0.3436 -93.9666 -1.1952 0.3425 0.7079 0.2717 0.5202 0.5399 0.5243
0.1227 7.0855 3750 0.3404 -93.8781 -1.2022 0.3382 0.7086 0.2674 0.5197 0.5399 0.5243
0.1142 7.1800 3800 0.3426 -93.8234 -1.1874 0.3420 0.7083 0.2712 0.5207 0.5399 0.5243
0.1142 7.2744 3850 0.3454 -93.6895 -1.1775 0.3442 0.7090 0.2733 0.5202 0.5399 0.5243
0.1128 7.3689 3900 0.3417 -94.0521 -1.1838 0.3406 0.7083 0.2698 0.5197 0.5399 0.5243
0.1158 7.4634 3950 0.3434 -93.9208 -1.1875 0.3423 0.7086 0.2714 0.5197 0.5399 0.5243
0.113 7.5579 4000 0.3428 -93.6866 -1.1850 0.3411 0.7087 0.2702 0.5197 0.5399 0.5243
0.1113 7.6523 4050 0.3434 -93.6171 -1.1837 0.3425 0.7087 0.2716 0.5202 0.5399 0.5243
0.1082 7.7468 4100 0.3411 -94.0013 -1.1852 0.3403 0.7081 0.2695 0.5192 0.5399 0.5243
0.1051 7.8413 4150 0.3425 -93.8552 -1.1848 0.3417 0.7083 0.2709 0.5197 0.5399 0.5243
0.1047 7.9358 4200 0.3422 -93.6696 -1.1872 0.3411 0.7085 0.2702 0.5197 0.5399 0.5243
0.0985 8.0302 4250 0.3416 -93.6924 -1.1844 0.3403 0.7083 0.2695 0.5197 0.5399 0.5243
0.0964 8.1247 4300 0.3422 -93.5025 -1.1871 0.3409 0.7082 0.2701 0.5202 0.5399 0.5243
0.0997 8.2192 4350 0.3423 -93.8074 -1.1866 0.3408 0.7081 0.2700 0.5186 0.5399 0.5243
0.0963 8.3137 4400 0.3434 -93.6885 -1.1861 0.3419 0.7084 0.2711 0.5202 0.5399 0.5243
0.0966 8.4081 4450 0.3434 -93.7312 -1.1875 0.3419 0.7084 0.2711 0.5186 0.5399 0.5243
0.0956 8.5026 4500 0.3431 -93.8431 -1.1866 0.3416 0.7081 0.2708 0.5186 0.5399 0.5243
0.0928 8.5971 4550 0.3428 -93.8243 -1.1859 0.3414 0.7084 0.2706 0.5186 0.5399 0.5243
0.0924 8.6915 4600 0.3418 -93.7706 -1.1871 0.3406 0.7082 0.2698 0.5186 0.5399 0.5243
0.0908 8.7860 4650 0.3415 -93.7405 -1.1872 0.3403 0.7079 0.2695 0.5202 0.5399 0.5243
0.0922 8.8805 4700 0.3419 -93.7126 -1.1888 0.3405 0.7078 0.2698 0.5202 0.5399 0.5243
0.0895 8.9750 4750 0.3417 -93.7926 -1.1886 0.3402 0.7080 0.2694 0.5202 0.5399 0.5243
0.0877 9.0694 4800 0.3425 -93.7523 -1.1891 0.3415 0.7083 0.2706 0.5197 0.5399 0.5243
0.0862 9.1639 4850 0.3423 -93.8492 -1.1894 0.3406 0.7082 0.2698 0.5207 0.5399 0.5243
0.0856 9.2584 4900 0.3417 -93.8453 -1.1883 0.3404 0.7081 0.2696 0.5197 0.5399 0.5243
0.0883 9.3529 4950 0.3414 -93.8773 -1.1886 0.3401 0.7080 0.2693 0.5202 0.5399 0.5243
0.0866 9.4473 5000 0.3414 -93.8593 -1.1880 0.3402 0.7081 0.2694 0.5197 0.5399 0.5243
0.0843 9.5418 5050 0.3417 -93.8241 -1.1880 0.3405 0.7081 0.2697 0.5207 0.5399 0.5243
0.0862 9.6363 5100 0.3419 -93.8268 -1.1884 0.3404 0.7081 0.2696 0.5197 0.5399 0.5243
0.0851 9.7308 5150 0.3418 -93.8247 -1.1881 0.3405 0.7082 0.2697 0.5192 0.5399 0.5243
0.0852 9.8252 5200 0.3415 -93.8257 -1.1886 0.3402 0.7081 0.2694 0.5197 0.5399 0.5243
0.0873 9.9197 5250 0.3418 -93.8220 -1.1885 0.3404 0.7083 0.2696 0.5197 0.5399 0.5243

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
494M params
Tensor type
FP16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-EXDPO-WEIGHT-BETA0.2

Finetuned
(4)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-EXDPO-WEIGHT-BETA0.2