Edit model card

tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-sum-sft-full_old on the openai/summarize_from_feedback dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6803
  • Rewards/chosen: -0.1265
  • Rewards/rejected: -0.1560
  • Rewards/accuracies: 0.6036
  • Rewards/margins: 0.0295
  • Logps/rejected: -78.7771
  • Logps/chosen: -71.3634
  • Logits/rejected: -2.9512
  • Logits/chosen: -2.9570

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-08
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 4

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6931 0.0689 100 0.6932 -0.0001 0.0001 0.4793 -0.0001 -63.1744 -58.7172 -3.1574 -3.1630
0.6932 0.1378 200 0.6931 0.0001 0.0001 0.4956 0.0000 -63.1716 -58.7029 -3.1576 -3.1633
0.693 0.2068 300 0.6932 0.0001 0.0002 0.4724 -0.0001 -63.1577 -58.7002 -3.1575 -3.1632
0.693 0.2757 400 0.6931 0.0003 0.0003 0.5007 0.0000 -63.1547 -58.6827 -3.1569 -3.1625
0.6927 0.3446 500 0.6931 0.0006 0.0004 0.5128 0.0002 -63.1359 -58.6518 -3.1563 -3.1619
0.6922 0.4135 600 0.6930 0.0009 0.0005 0.5358 0.0004 -63.1295 -58.6249 -3.1544 -3.1600
0.692 0.4824 700 0.6928 0.0015 0.0008 0.5516 0.0007 -63.0973 -58.5609 -3.1522 -3.1578
0.6911 0.5513 800 0.6926 0.0018 0.0006 0.5634 0.0012 -63.1172 -58.5317 -3.1497 -3.1553
0.6903 0.6203 900 0.6923 0.0019 0.0002 0.5641 0.0017 -63.1634 -58.5242 -3.1456 -3.1513
0.6899 0.6892 1000 0.6920 0.0016 -0.0008 0.5676 0.0024 -63.2556 -58.5502 -3.1411 -3.1467
0.6898 0.7581 1100 0.6916 0.0011 -0.0021 0.5802 0.0032 -63.3925 -58.6040 -3.1359 -3.1415
0.689 0.8270 1200 0.6913 0.0000 -0.0038 0.5753 0.0038 -63.5565 -58.7099 -3.1316 -3.1371
0.6881 0.8959 1300 0.6910 -0.0015 -0.0061 0.5804 0.0046 -63.7902 -58.8624 -3.1268 -3.1325
0.6874 0.9649 1400 0.6907 -0.0037 -0.0088 0.5825 0.0051 -64.0628 -59.0799 -3.1213 -3.1269
0.6867 1.0338 1500 0.6903 -0.0063 -0.0124 0.5843 0.0061 -64.4169 -59.3381 -3.1142 -3.1198
0.6857 1.1027 1600 0.6899 -0.0097 -0.0166 0.5876 0.0069 -64.8429 -59.6860 -3.1081 -3.1137
0.6843 1.1716 1700 0.6895 -0.0148 -0.0227 0.5804 0.0078 -65.4468 -60.1953 -3.1013 -3.1070
0.6842 1.2405 1800 0.6890 -0.0219 -0.0309 0.5871 0.0089 -66.2668 -60.9047 -3.0944 -3.1001
0.6802 1.3094 1900 0.6886 -0.0263 -0.0362 0.5920 0.0098 -66.7954 -61.3438 -3.0883 -3.0940
0.6824 1.3784 2000 0.6881 -0.0324 -0.0436 0.5939 0.0112 -67.5355 -61.9519 -3.0814 -3.0871
0.6799 1.4473 2100 0.6875 -0.0387 -0.0510 0.5992 0.0123 -68.2835 -62.5824 -3.0754 -3.0811
0.6793 1.5162 2200 0.6872 -0.0420 -0.0551 0.5913 0.0131 -68.6940 -62.9161 -3.0698 -3.0755
0.6797 1.5851 2300 0.6868 -0.0485 -0.0626 0.5918 0.0141 -69.4427 -63.5627 -3.0623 -3.0680
0.6792 1.6540 2400 0.6863 -0.0512 -0.0663 0.5939 0.0151 -69.8102 -63.8365 -3.0547 -3.0604
0.6775 1.7229 2500 0.6860 -0.0552 -0.0710 0.5946 0.0158 -70.2800 -64.2325 -3.0488 -3.0546
0.6768 1.7919 2600 0.6856 -0.0598 -0.0766 0.5936 0.0169 -70.8443 -64.6883 -3.0412 -3.0469
0.675 1.8608 2700 0.6851 -0.0654 -0.0832 0.5948 0.0178 -71.4996 -65.2471 -3.0345 -3.0402
0.6736 1.9297 2800 0.6847 -0.0707 -0.0896 0.5983 0.0189 -72.1448 -65.7864 -3.0286 -3.0344
0.6773 1.9986 2900 0.6844 -0.0746 -0.0943 0.6020 0.0196 -72.6052 -66.1758 -3.0225 -3.0283
0.6724 2.0675 3000 0.6841 -0.0793 -0.0997 0.6029 0.0204 -73.1465 -66.6415 -3.0158 -3.0216
0.674 2.1365 3100 0.6837 -0.0824 -0.1036 0.6029 0.0212 -73.5381 -66.9540 -3.0112 -3.0169
0.6764 2.2054 3200 0.6834 -0.0857 -0.1076 0.6066 0.0219 -73.9390 -67.2856 -3.0047 -3.0105
0.6749 2.2743 3300 0.6831 -0.0887 -0.1113 0.6069 0.0226 -74.3103 -67.5846 -2.9991 -3.0049
0.6746 2.3432 3400 0.6828 -0.0921 -0.1154 0.6055 0.0233 -74.7230 -67.9247 -2.9944 -3.0002
0.6718 2.4121 3500 0.6824 -0.0962 -0.1204 0.6069 0.0242 -75.2213 -68.3350 -2.9890 -2.9948
0.672 2.4810 3600 0.6822 -0.1013 -0.1261 0.6048 0.0248 -75.7936 -68.8439 -2.9844 -2.9902
0.6733 2.5500 3700 0.6820 -0.1048 -0.1302 0.6032 0.0254 -76.1958 -69.1902 -2.9800 -2.9858
0.6715 2.6189 3800 0.6817 -0.1077 -0.1336 0.6046 0.0260 -76.5409 -69.4776 -2.9765 -2.9823
0.6709 2.6878 3900 0.6816 -0.1102 -0.1366 0.6020 0.0264 -76.8374 -69.7330 -2.9729 -2.9787
0.6696 2.7567 4000 0.6814 -0.1132 -0.1400 0.6032 0.0268 -77.1831 -70.0346 -2.9698 -2.9756
0.6687 2.8256 4100 0.6812 -0.1154 -0.1427 0.6048 0.0273 -77.4501 -70.2526 -2.9670 -2.9729
0.6692 2.8946 4200 0.6810 -0.1166 -0.1443 0.6073 0.0277 -77.6081 -70.3715 -2.9649 -2.9708
0.6742 2.9635 4300 0.6809 -0.1184 -0.1463 0.6027 0.0279 -77.8100 -70.5513 -2.9629 -2.9687
0.6652 3.0324 4400 0.6808 -0.1191 -0.1473 0.6090 0.0282 -77.9141 -70.6218 -2.9606 -2.9664
0.6659 3.1013 4500 0.6807 -0.1206 -0.1490 0.6046 0.0284 -78.0785 -70.7742 -2.9587 -2.9645
0.666 3.1702 4600 0.6805 -0.1225 -0.1512 0.6062 0.0288 -78.3027 -70.9582 -2.9569 -2.9628
0.6644 3.2391 4700 0.6805 -0.1237 -0.1527 0.6059 0.0290 -78.4454 -71.0785 -2.9557 -2.9615
0.6685 3.3081 4800 0.6804 -0.1246 -0.1536 0.6053 0.0291 -78.5441 -71.1674 -2.9547 -2.9605
0.6651 3.3770 4900 0.6803 -0.1250 -0.1542 0.6039 0.0293 -78.6030 -71.2072 -2.9539 -2.9598
0.6689 3.4459 5000 0.6803 -0.1254 -0.1547 0.6062 0.0293 -78.6476 -71.2503 -2.9530 -2.9588
0.6653 3.5148 5100 0.6802 -0.1256 -0.1552 0.6050 0.0296 -78.6955 -71.2721 -2.9525 -2.9583
0.6664 3.5837 5200 0.6803 -0.1261 -0.1556 0.6046 0.0295 -78.7380 -71.3226 -2.9519 -2.9577
0.6687 3.6527 5300 0.6803 -0.1265 -0.1559 0.6064 0.0294 -78.7701 -71.3572 -2.9516 -2.9574
0.6641 3.7216 5400 0.6803 -0.1266 -0.1560 0.6059 0.0294 -78.7822 -71.3690 -2.9514 -2.9573
0.6637 3.7905 5500 0.6803 -0.1265 -0.1559 0.6053 0.0295 -78.7736 -71.3579 -2.9516 -2.9575
0.6694 3.8594 5600 0.6802 -0.1265 -0.1561 0.6036 0.0296 -78.7869 -71.3611 -2.9515 -2.9574
0.6684 3.9283 5700 0.6803 -0.1266 -0.1560 0.6071 0.0294 -78.7792 -71.3707 -2.9512 -2.9571
0.6668 3.9972 5800 0.6803 -0.1265 -0.1560 0.6036 0.0295 -78.7771 -71.3634 -2.9512 -2.9570

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
1.1B params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train martimfasantos/tinyllama-1.1b-sum-dpo-full_LR5e-8_BS64_4epochs_old