Edit model card

TinyLlama-1.1B-Chat-v1.0-reasoning-v2-dpo

This model is a fine-tuned version of alexredna/TinyLlama-1.1B-Chat-v1.0-reasoning-v2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1772
  • Rewards/chosen: -0.9390
  • Rewards/rejected: -4.1141
  • Rewards/accuracies: 0.8385
  • Rewards/margins: 3.1750
  • Logps/rejected: -327.8484
  • Logps/chosen: -280.3031
  • Logits/rejected: -2.7526
  • Logits/chosen: -2.6271

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6892 0.06 100 0.6904 -0.0007 -0.0068 0.4692 0.0061 -286.7757 -270.9199 -2.7940 -2.6576
0.6767 0.13 200 0.6754 -0.0060 -0.0430 0.6385 0.0370 -287.1373 -270.9724 -2.7931 -2.6568
0.6493 0.19 300 0.6431 -0.0105 -0.1151 0.7885 0.1046 -287.8588 -271.0174 -2.7922 -2.6561
0.5809 0.25 400 0.5879 -0.0345 -0.2649 0.8308 0.2304 -289.3571 -271.2578 -2.7893 -2.6534
0.4994 0.32 500 0.5043 -0.0774 -0.5296 0.8385 0.4522 -292.0042 -271.6873 -2.7851 -2.6499
0.4093 0.38 600 0.4360 -0.1267 -0.8043 0.8385 0.6776 -294.7504 -272.1800 -2.7820 -2.6476
0.3951 0.44 700 0.3844 -0.1731 -1.0600 0.8423 0.8870 -297.3079 -272.6434 -2.7796 -2.6459
0.3307 0.51 800 0.3413 -0.2208 -1.3252 0.8346 1.1044 -299.9597 -273.1208 -2.7764 -2.6434
0.3035 0.57 900 0.3095 -0.2914 -1.5963 0.8308 1.3049 -302.6710 -273.8272 -2.7734 -2.6410
0.2565 0.63 1000 0.2856 -0.3318 -1.8163 0.8385 1.4845 -304.8706 -274.2305 -2.7712 -2.6397
0.2409 0.7 1100 0.2676 -0.3754 -2.0199 0.8385 1.6445 -306.9071 -274.6673 -2.7691 -2.6380
0.2341 0.76 1200 0.2515 -0.4233 -2.2275 0.8385 1.8042 -308.9832 -275.1463 -2.7675 -2.6371
0.2584 0.82 1300 0.2393 -0.4799 -2.4301 0.8385 1.9501 -311.0082 -275.7123 -2.7653 -2.6355
0.2171 0.89 1400 0.2294 -0.5274 -2.6087 0.8385 2.0812 -312.7944 -276.1873 -2.7635 -2.6342
0.1638 0.95 1500 0.2206 -0.5748 -2.7894 0.8385 2.2146 -314.6021 -276.6611 -2.7623 -2.6336
0.2334 1.02 1600 0.2147 -0.6108 -2.9348 0.8385 2.3240 -316.0559 -277.0210 -2.7603 -2.6319
0.2178 1.08 1700 0.2086 -0.6523 -3.0743 0.8385 2.4220 -317.4505 -277.4355 -2.7597 -2.6314
0.1704 1.14 1800 0.2037 -0.6819 -3.1955 0.8385 2.5136 -318.6626 -277.7317 -2.7590 -2.6309
0.1683 1.21 1900 0.1996 -0.7152 -3.3176 0.8385 2.6024 -319.8835 -278.0646 -2.7587 -2.6313
0.271 1.27 2000 0.1959 -0.7447 -3.4272 0.8385 2.6825 -320.9794 -278.3595 -2.7576 -2.6305
0.127 1.33 2100 0.1930 -0.7665 -3.5137 0.8385 2.7472 -321.8449 -278.5782 -2.7571 -2.6302
0.2107 1.4 2200 0.1905 -0.7830 -3.5883 0.8385 2.8053 -322.5906 -278.7429 -2.7572 -2.6305
0.1977 1.46 2300 0.1883 -0.7986 -3.6574 0.8385 2.8588 -323.2822 -278.8991 -2.7566 -2.6300
0.1655 1.52 2400 0.1872 -0.8203 -3.7149 0.8385 2.8946 -323.8572 -279.1161 -2.7553 -2.6289
0.1776 1.59 2500 0.1850 -0.8439 -3.7881 0.8385 2.9442 -324.5885 -279.3518 -2.7548 -2.6285
0.1372 1.65 2600 0.1850 -0.8548 -3.8280 0.8385 2.9732 -324.9880 -279.4609 -2.7544 -2.6282
0.15 1.71 2700 0.1836 -0.8734 -3.8792 0.8385 3.0059 -325.5001 -279.6465 -2.7543 -2.6283
0.1338 1.78 2800 0.1823 -0.8736 -3.9132 0.8385 3.0396 -325.8393 -279.6486 -2.7541 -2.6282
0.1507 1.84 2900 0.1811 -0.8932 -3.9558 0.8385 3.0626 -326.2653 -279.8444 -2.7533 -2.6273
0.1615 1.9 3000 0.1811 -0.8986 -3.9790 0.8385 3.0804 -326.4981 -279.8992 -2.7533 -2.6275
0.1656 1.97 3100 0.1800 -0.9039 -4.0052 0.8385 3.1012 -326.7594 -279.9523 -2.7528 -2.6270
0.1398 2.03 3200 0.1797 -0.9123 -4.0258 0.8385 3.1135 -326.9660 -280.0360 -2.7534 -2.6278
0.1929 2.09 3300 0.1792 -0.9098 -4.0380 0.8385 3.1282 -327.0879 -280.0112 -2.7524 -2.6269
0.1616 2.16 3400 0.1787 -0.9249 -4.0622 0.8385 3.1374 -327.3301 -280.1616 -2.7519 -2.6263
0.1664 2.22 3500 0.1790 -0.9246 -4.0716 0.8385 3.1470 -327.4239 -280.1592 -2.7524 -2.6269
0.2085 2.28 3600 0.1787 -0.9301 -4.0835 0.8385 3.1534 -327.5426 -280.2136 -2.7532 -2.6279
0.1565 2.35 3700 0.1782 -0.9301 -4.0909 0.8385 3.1608 -327.6164 -280.2137 -2.7521 -2.6265
0.153 2.41 3800 0.1778 -0.9281 -4.0947 0.8385 3.1666 -327.6550 -280.1937 -2.7522 -2.6268
0.1787 2.47 3900 0.1783 -0.9319 -4.0918 0.8385 3.1599 -327.6259 -280.2316 -2.7520 -2.6266
0.172 2.54 4000 0.1780 -0.9338 -4.1035 0.8385 3.1697 -327.7429 -280.2505 -2.7526 -2.6273
0.2643 2.6 4100 0.1771 -0.9229 -4.0969 0.8385 3.1739 -327.6764 -280.1422 -2.7521 -2.6267
0.1619 2.66 4200 0.1776 -0.9326 -4.1083 0.8385 3.1757 -327.7909 -280.2390 -2.7523 -2.6270
0.2413 2.73 4300 0.1778 -0.9292 -4.1024 0.8385 3.1732 -327.7315 -280.2050 -2.7529 -2.6277
0.1187 2.79 4400 0.1778 -0.9343 -4.1068 0.8385 3.1725 -327.7758 -280.2554 -2.7521 -2.6267
0.1439 2.86 4500 0.1776 -0.9368 -4.1118 0.8385 3.1750 -327.8253 -280.2808 -2.7517 -2.6263
0.1116 2.92 4600 0.1773 -0.9302 -4.1079 0.8385 3.1777 -327.7867 -280.2152 -2.7526 -2.6272
0.18 2.98 4700 0.1772 -0.9290 -4.1048 0.8385 3.1758 -327.7554 -280.2029 -2.7526 -2.6271

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.14.6
  • Tokenizers 0.15.0
Downloads last month
1,970
Safetensors
Model size
1.1B params
Tensor type
BF16
·

Finetuned from