Edit model card

tinyllama-1.1b-chat-dpo-full

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-chat-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5860
  • Rewards/chosen: -1.1602
  • Rewards/rejected: -1.6135
  • Rewards/accuracies: 0.6890
  • Rewards/margins: 0.4533
  • Logps/rejected: -458.4552
  • Logps/chosen: -452.2377
  • Logits/rejected: -2.3877
  • Logits/chosen: -2.4300

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.693 0.0262 100 0.6929 -0.0014 -0.0019 0.5320 0.0006 -297.2994 -336.3557 -3.1228 -3.1361
0.6887 0.0523 200 0.6892 -0.0302 -0.0383 0.6160 0.0081 -300.9348 -339.2341 -3.1215 -3.1346
0.6789 0.0785 300 0.6794 -0.0789 -0.1087 0.6360 0.0299 -307.9798 -344.1051 -3.1094 -3.1216
0.6624 0.1047 400 0.6635 -0.1807 -0.2518 0.6390 0.0711 -322.2854 -354.2890 -3.0664 -3.0771
0.6373 0.1309 500 0.6503 -0.2988 -0.4120 0.6425 0.1133 -338.3080 -366.0959 -2.9693 -2.9839
0.6423 0.1570 600 0.6457 -0.3891 -0.5345 0.6375 0.1454 -350.5518 -375.1291 -2.9372 -2.9538
0.6266 0.1832 700 0.6420 -0.7030 -0.9081 0.6365 0.2051 -387.9123 -406.5211 -2.9095 -2.9229
0.5942 0.2094 800 0.6367 -0.4969 -0.6764 0.6475 0.1795 -364.7484 -385.9118 -2.9255 -2.9397
0.6171 0.2355 900 0.6330 -0.5389 -0.7443 0.6545 0.2054 -371.5351 -390.1065 -2.8815 -2.8992
0.6156 0.2617 1000 0.6271 -0.9278 -1.1788 0.6460 0.2510 -414.9855 -428.9975 -2.8469 -2.8665
0.6636 0.2879 1100 0.6234 -0.7984 -1.0304 0.6515 0.2320 -400.1489 -416.0618 -2.8144 -2.8347
0.6832 0.3141 1200 0.6152 -1.0303 -1.3170 0.6570 0.2866 -428.8004 -439.2536 -2.7994 -2.8212
0.5967 0.3402 1300 0.6131 -1.2342 -1.5321 0.6655 0.2979 -450.3198 -459.6400 -2.7494 -2.7756
0.596 0.3664 1400 0.6064 -0.8587 -1.1697 0.6820 0.3110 -414.0766 -422.0903 -2.8084 -2.8289
0.592 0.3926 1500 0.6027 -0.9689 -1.3189 0.6715 0.3499 -428.9929 -433.1132 -2.7455 -2.7703
0.6353 0.4187 1600 0.6051 -0.9640 -1.3223 0.6745 0.3582 -429.3314 -432.6226 -2.6972 -2.7245
0.6603 0.4449 1700 0.6016 -0.9893 -1.3221 0.6765 0.3328 -429.3145 -435.1521 -2.7021 -2.7305
0.5551 0.4711 1800 0.6023 -1.0035 -1.3765 0.6790 0.3731 -434.7590 -436.5641 -2.6159 -2.6492
0.5877 0.4973 1900 0.5975 -0.8137 -1.1853 0.6835 0.3716 -415.6308 -417.5872 -2.6621 -2.6941
0.5827 0.5234 2000 0.5935 -0.8724 -1.2562 0.6810 0.3838 -422.7221 -423.4575 -2.6043 -2.6396
0.6017 0.5496 2100 0.5911 -1.0065 -1.3971 0.6905 0.3907 -436.8172 -436.8658 -2.6105 -2.6436
0.5539 0.5758 2200 0.5920 -0.9060 -1.2945 0.6885 0.3884 -426.5499 -426.8195 -2.5724 -2.6076
0.5795 0.6019 2300 0.5914 -1.1164 -1.5398 0.6865 0.4234 -451.0841 -447.8605 -2.5399 -2.5757
0.5657 0.6281 2400 0.5904 -1.0347 -1.4494 0.6860 0.4147 -442.0414 -439.6861 -2.5121 -2.5487
0.5306 0.6543 2500 0.5918 -1.0464 -1.4840 0.6825 0.4376 -445.5005 -440.8591 -2.4692 -2.5102
0.5762 0.6805 2600 0.5927 -1.0687 -1.5141 0.6780 0.4455 -448.5193 -443.0862 -2.4291 -2.4735
0.6016 0.7066 2700 0.5936 -1.0767 -1.5080 0.6800 0.4313 -447.9063 -443.8889 -2.4329 -2.4747
0.6068 0.7328 2800 0.5897 -1.1905 -1.6433 0.6820 0.4527 -461.4312 -455.2722 -2.4294 -2.4708
0.5821 0.7590 2900 0.5870 -1.1245 -1.5598 0.6845 0.4353 -453.0833 -448.6697 -2.4470 -2.4862
0.5393 0.7851 3000 0.5873 -1.2223 -1.6710 0.6870 0.4486 -464.2020 -458.4521 -2.4161 -2.4565
0.577 0.8113 3100 0.5886 -1.1359 -1.5757 0.6845 0.4399 -454.6796 -449.8056 -2.4137 -2.4538
0.5731 0.8375 3200 0.5864 -1.1928 -1.6493 0.6900 0.4564 -462.0313 -455.5009 -2.3988 -2.4401
0.586 0.8636 3300 0.5865 -1.1740 -1.6231 0.6895 0.4492 -459.4178 -453.6159 -2.3969 -2.4384
0.5629 0.8898 3400 0.5860 -1.1573 -1.6086 0.6890 0.4513 -457.9694 -451.9486 -2.3882 -2.4306
0.6059 0.9160 3500 0.5858 -1.1672 -1.6213 0.6890 0.4541 -459.2307 -452.9388 -2.3897 -2.4320
0.5703 0.9422 3600 0.5860 -1.1607 -1.6138 0.6870 0.4532 -458.4890 -452.2865 -2.3897 -2.4320
0.5533 0.9683 3700 0.5858 -1.1623 -1.6161 0.6880 0.4538 -458.7165 -452.4510 -2.3882 -2.4304
0.5988 0.9945 3800 0.5862 -1.1608 -1.6138 0.6885 0.4530 -458.4823 -452.2973 -2.3882 -2.4306

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.1.2
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
8
Safetensors
Model size
1.1B params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-chat-dpo-full