File size: 18,932 Bytes
01a3eef
 
 
 
4a3715d
01a3eef
 
 
4a3715d
 
 
 
 
01a3eef
 
 
 
 
 
 
 
 
 
4a3715d
01a3eef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- openai/summarize_from_feedback
model-index:
- name: tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama-1.1b-sum-dpo-full_LR3e-8_BS32_3epochs_old

This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on the openai/summarize_from_feedback dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6867
- Rewards/chosen: -0.0478
- Rewards/rejected: -0.0620
- Rewards/accuracies: 0.5936
- Rewards/margins: 0.0142
- Logps/rejected: -69.3779
- Logps/chosen: -63.4876
- Logits/rejected: -3.0580
- Logits/chosen: -3.0637

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 3e-08
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6931        | 0.0345 | 100  | 0.6932          | 0.0001         | 0.0001           | 0.4930             | -0.0000         | -63.1672       | -58.7024     | -3.1577         | -3.1633       |
| 0.6931        | 0.0689 | 200  | 0.6932          | 0.0001         | 0.0001           | 0.4888             | -0.0001         | -63.1661       | -58.7066     | -3.1577         | -3.1634       |
| 0.6931        | 0.1034 | 300  | 0.6932          | 0.0000         | 0.0001           | 0.4933             | -0.0001         | -63.1693       | -58.7071     | -3.1578         | -3.1634       |
| 0.6931        | 0.1378 | 400  | 0.6932          | 0.0001         | 0.0001           | 0.4809             | -0.0000         | -63.1727       | -58.7061     | -3.1575         | -3.1632       |
| 0.6931        | 0.1723 | 500  | 0.6931          | 0.0002         | 0.0002           | 0.5098             | 0.0000          | -63.1633       | -58.6928     | -3.1577         | -3.1634       |
| 0.6931        | 0.2068 | 600  | 0.6932          | 0.0002         | 0.0002           | 0.4937             | -0.0000         | -63.1596       | -58.6920     | -3.1574         | -3.1630       |
| 0.6929        | 0.2412 | 700  | 0.6931          | 0.0003         | 0.0002           | 0.4905             | 0.0001          | -63.1582       | -58.6817     | -3.1572         | -3.1629       |
| 0.6929        | 0.2757 | 800  | 0.6931          | 0.0004         | 0.0003           | 0.5237             | 0.0001          | -63.1485       | -58.6703     | -3.1566         | -3.1622       |
| 0.6927        | 0.3101 | 900  | 0.6931          | 0.0006         | 0.0004           | 0.5186             | 0.0001          | -63.1378       | -58.6559     | -3.1564         | -3.1620       |
| 0.6925        | 0.3446 | 1000 | 0.6930          | 0.0008         | 0.0004           | 0.5279             | 0.0003          | -63.1375       | -58.6361     | -3.1554         | -3.1610       |
| 0.6924        | 0.3790 | 1100 | 0.6930          | 0.0009         | 0.0005           | 0.5560             | 0.0004          | -63.1285       | -58.6220     | -3.1548         | -3.1604       |
| 0.692         | 0.4135 | 1200 | 0.6929          | 0.0011         | 0.0006           | 0.5407             | 0.0005          | -63.1206       | -58.5973     | -3.1539         | -3.1595       |
| 0.6914        | 0.4480 | 1300 | 0.6928          | 0.0013         | 0.0007           | 0.5383             | 0.0006          | -63.1120       | -58.5819     | -3.1528         | -3.1584       |
| 0.6917        | 0.4824 | 1400 | 0.6927          | 0.0016         | 0.0006           | 0.5648             | 0.0009          | -63.1160       | -58.5533     | -3.1518         | -3.1574       |
| 0.6914        | 0.5169 | 1500 | 0.6926          | 0.0016         | 0.0006           | 0.5574             | 0.0010          | -63.1243       | -58.5539     | -3.1505         | -3.1561       |
| 0.6916        | 0.5513 | 1600 | 0.6926          | 0.0018         | 0.0007           | 0.5576             | 0.0012          | -63.1145       | -58.5288     | -3.1493         | -3.1549       |
| 0.6906        | 0.5858 | 1700 | 0.6925          | 0.0019         | 0.0004           | 0.5625             | 0.0014          | -63.1358       | -58.5250     | -3.1471         | -3.1527       |
| 0.6908        | 0.6203 | 1800 | 0.6923          | 0.0019         | 0.0002           | 0.5551             | 0.0017          | -63.1602       | -58.5198     | -3.1456         | -3.1513       |
| 0.6903        | 0.6547 | 1900 | 0.6922          | 0.0019         | -0.0001          | 0.5720             | 0.0020          | -63.1895       | -58.5253     | -3.1437         | -3.1493       |
| 0.6895        | 0.6892 | 2000 | 0.6920          | 0.0016         | -0.0007          | 0.5795             | 0.0023          | -63.2502       | -58.5471     | -3.1418         | -3.1475       |
| 0.6891        | 0.7236 | 2100 | 0.6919          | 0.0017         | -0.0009          | 0.5818             | 0.0026          | -63.2700       | -58.5423     | -3.1394         | -3.1450       |
| 0.6906        | 0.7581 | 2200 | 0.6918          | 0.0013         | -0.0016          | 0.5737             | 0.0028          | -63.3380       | -58.5865     | -3.1376         | -3.1432       |
| 0.6893        | 0.7926 | 2300 | 0.6917          | 0.0011         | -0.0020          | 0.5730             | 0.0031          | -63.3761       | -58.6009     | -3.1358         | -3.1414       |
| 0.6899        | 0.8270 | 2400 | 0.6915          | 0.0006         | -0.0028          | 0.5764             | 0.0034          | -63.4591       | -58.6538     | -3.1338         | -3.1394       |
| 0.6894        | 0.8615 | 2500 | 0.6914          | 0.0002         | -0.0034          | 0.5743             | 0.0036          | -63.5245       | -58.6934     | -3.1315         | -3.1372       |
| 0.6883        | 0.8959 | 2600 | 0.6912          | -0.0003        | -0.0043          | 0.5764             | 0.0040          | -63.6123       | -58.7457     | -3.1297         | -3.1354       |
| 0.6875        | 0.9304 | 2700 | 0.6911          | -0.0010        | -0.0053          | 0.5781             | 0.0043          | -63.7097       | -58.8142     | -3.1282         | -3.1338       |
| 0.6871        | 0.9649 | 2800 | 0.6910          | -0.0016        | -0.0061          | 0.5760             | 0.0045          | -63.7868       | -58.8701     | -3.1261         | -3.1317       |
| 0.6871        | 0.9993 | 2900 | 0.6909          | -0.0024        | -0.0072          | 0.5762             | 0.0048          | -63.8972       | -58.9496     | -3.1231         | -3.1287       |
| 0.6874        | 1.0338 | 3000 | 0.6907          | -0.0032        | -0.0084          | 0.5834             | 0.0051          | -64.0164       | -59.0348     | -3.1212         | -3.1268       |
| 0.6859        | 1.0682 | 3100 | 0.6906          | -0.0042        | -0.0096          | 0.5806             | 0.0054          | -64.1398       | -59.1344     | -3.1190         | -3.1247       |
| 0.6842        | 1.1027 | 3200 | 0.6904          | -0.0051        | -0.0109          | 0.5839             | 0.0058          | -64.2725       | -59.2256     | -3.1161         | -3.1218       |
| 0.6884        | 1.1371 | 3300 | 0.6903          | -0.0066        | -0.0127          | 0.5874             | 0.0061          | -64.4506       | -59.3731     | -3.1139         | -3.1196       |
| 0.6858        | 1.1716 | 3400 | 0.6902          | -0.0080        | -0.0142          | 0.5785             | 0.0062          | -64.5965       | -59.5071     | -3.1116         | -3.1173       |
| 0.6859        | 1.2061 | 3500 | 0.6900          | -0.0099        | -0.0166          | 0.5832             | 0.0066          | -64.8362       | -59.7041     | -3.1101         | -3.1158       |
| 0.685         | 1.2405 | 3600 | 0.6899          | -0.0115        | -0.0185          | 0.5783             | 0.0069          | -65.0265       | -59.8637     | -3.1069         | -3.1126       |
| 0.6839        | 1.2750 | 3700 | 0.6898          | -0.0129        | -0.0202          | 0.5820             | 0.0072          | -65.1978       | -60.0064     | -3.1049         | -3.1106       |
| 0.6824        | 1.3094 | 3800 | 0.6896          | -0.0145        | -0.0220          | 0.5832             | 0.0076          | -65.3850       | -60.1580     | -3.1023         | -3.1080       |
| 0.6847        | 1.3439 | 3900 | 0.6895          | -0.0161        | -0.0240          | 0.5834             | 0.0078          | -65.5760       | -60.3265     | -3.1007         | -3.1064       |
| 0.6865        | 1.3784 | 4000 | 0.6894          | -0.0179        | -0.0261          | 0.5876             | 0.0081          | -65.7873       | -60.5061     | -3.0990         | -3.1047       |
| 0.6826        | 1.4128 | 4100 | 0.6892          | -0.0197        | -0.0282          | 0.5899             | 0.0085          | -65.9972       | -60.6782     | -3.0968         | -3.1025       |
| 0.6801        | 1.4473 | 4200 | 0.6890          | -0.0209        | -0.0299          | 0.5922             | 0.0090          | -66.1658       | -60.8002     | -3.0952         | -3.1009       |
| 0.6814        | 1.4817 | 4300 | 0.6890          | -0.0227        | -0.0318          | 0.5878             | 0.0091          | -66.3577       | -60.9789     | -3.0926         | -3.0983       |
| 0.683         | 1.5162 | 4400 | 0.6888          | -0.0239        | -0.0334          | 0.5913             | 0.0094          | -66.5158       | -61.1062     | -3.0910         | -3.0967       |
| 0.679         | 1.5507 | 4500 | 0.6887          | -0.0255        | -0.0352          | 0.5948             | 0.0097          | -66.7038       | -61.2636     | -3.0892         | -3.0949       |
| 0.6834        | 1.5851 | 4600 | 0.6886          | -0.0275        | -0.0375          | 0.5934             | 0.0100          | -66.9283       | -61.4618     | -3.0871         | -3.0928       |
| 0.685         | 1.6196 | 4700 | 0.6884          | -0.0284        | -0.0387          | 0.5929             | 0.0103          | -67.0469       | -61.5498     | -3.0853         | -3.0910       |
| 0.683         | 1.6540 | 4800 | 0.6883          | -0.0294        | -0.0400          | 0.5960             | 0.0106          | -67.1815       | -61.6491     | -3.0831         | -3.0889       |
| 0.6781        | 1.6885 | 4900 | 0.6882          | -0.0307        | -0.0416          | 0.5950             | 0.0109          | -67.3424       | -61.7858     | -3.0820         | -3.0877       |
| 0.6813        | 1.7229 | 5000 | 0.6881          | -0.0317        | -0.0426          | 0.5943             | 0.0110          | -67.4448       | -61.8785     | -3.0805         | -3.0863       |
| 0.6823        | 1.7574 | 5100 | 0.6880          | -0.0328        | -0.0440          | 0.5950             | 0.0112          | -67.5799       | -61.9921     | -3.0789         | -3.0846       |
| 0.6798        | 1.7919 | 5200 | 0.6879          | -0.0341        | -0.0457          | 0.5987             | 0.0116          | -67.7483       | -62.1205     | -3.0772         | -3.0829       |
| 0.6798        | 1.8263 | 5300 | 0.6877          | -0.0353        | -0.0472          | 0.5953             | 0.0119          | -67.8958       | -62.2422     | -3.0757         | -3.0814       |
| 0.6784        | 1.8608 | 5400 | 0.6876          | -0.0368        | -0.0489          | 0.5969             | 0.0122          | -68.0724       | -62.3875     | -3.0742         | -3.0798       |
| 0.6853        | 1.8952 | 5500 | 0.6876          | -0.0377        | -0.0500          | 0.5946             | 0.0123          | -68.1765       | -62.4820     | -3.0735         | -3.0792       |
| 0.6769        | 1.9297 | 5600 | 0.6875          | -0.0392        | -0.0517          | 0.5941             | 0.0125          | -68.3471       | -62.6278     | -3.0713         | -3.0771       |
| 0.6788        | 1.9642 | 5700 | 0.6874          | -0.0399        | -0.0526          | 0.5941             | 0.0127          | -68.4439       | -62.7029     | -3.0701         | -3.0759       |
| 0.6798        | 1.9986 | 5800 | 0.6873          | -0.0410        | -0.0538          | 0.5925             | 0.0128          | -68.5632       | -62.8140     | -3.0694         | -3.0752       |
| 0.683         | 2.0331 | 5900 | 0.6872          | -0.0418        | -0.0549          | 0.5934             | 0.0131          | -68.6699       | -62.8917     | -3.0677         | -3.0735       |
| 0.6766        | 2.0675 | 6000 | 0.6872          | -0.0425        | -0.0555          | 0.5918             | 0.0130          | -68.7314       | -62.9600     | -3.0675         | -3.0732       |
| 0.6756        | 2.1020 | 6100 | 0.6871          | -0.0428        | -0.0561          | 0.5922             | 0.0133          | -68.7950       | -62.9959     | -3.0660         | -3.0717       |
| 0.6805        | 2.1365 | 6200 | 0.6871          | -0.0435        | -0.0568          | 0.5904             | 0.0133          | -68.8622       | -63.0611     | -3.0654         | -3.0711       |
| 0.6797        | 2.1709 | 6300 | 0.6871          | -0.0443        | -0.0577          | 0.5929             | 0.0134          | -68.9493       | -63.1378     | -3.0645         | -3.0703       |
| 0.6802        | 2.2054 | 6400 | 0.6870          | -0.0442        | -0.0577          | 0.5913             | 0.0135          | -68.9530       | -63.1312     | -3.0641         | -3.0698       |
| 0.6802        | 2.2398 | 6500 | 0.6870          | -0.0445        | -0.0581          | 0.5934             | 0.0136          | -68.9891       | -63.1579     | -3.0633         | -3.0690       |
| 0.6806        | 2.2743 | 6600 | 0.6870          | -0.0448        | -0.0585          | 0.5925             | 0.0136          | -69.0289       | -63.1964     | -3.0624         | -3.0682       |
| 0.6755        | 2.3088 | 6700 | 0.6869          | -0.0453        | -0.0590          | 0.5918             | 0.0137          | -69.0814       | -63.2383     | -3.0618         | -3.0675       |
| 0.6826        | 2.3432 | 6800 | 0.6869          | -0.0455        | -0.0593          | 0.5962             | 0.0138          | -69.1095       | -63.2637     | -3.0612         | -3.0669       |
| 0.6786        | 2.3777 | 6900 | 0.6869          | -0.0459        | -0.0598          | 0.5892             | 0.0139          | -69.1580       | -63.3046     | -3.0607         | -3.0664       |
| 0.6798        | 2.4121 | 7000 | 0.6868          | -0.0463        | -0.0602          | 0.5934             | 0.0139          | -69.2011       | -63.3391     | -3.0601         | -3.0658       |
| 0.6762        | 2.4466 | 7100 | 0.6868          | -0.0466        | -0.0606          | 0.5936             | 0.0140          | -69.2414       | -63.3699     | -3.0598         | -3.0656       |
| 0.6782        | 2.4810 | 7200 | 0.6868          | -0.0470        | -0.0611          | 0.5918             | 0.0141          | -69.2927       | -63.4167     | -3.0595         | -3.0652       |
| 0.6821        | 2.5155 | 7300 | 0.6868          | -0.0472        | -0.0612          | 0.5943             | 0.0140          | -69.3050       | -63.4345     | -3.0589         | -3.0647       |
| 0.6806        | 2.5500 | 7400 | 0.6868          | -0.0473        | -0.0614          | 0.5908             | 0.0141          | -69.3214       | -63.4432     | -3.0588         | -3.0646       |
| 0.6824        | 2.5844 | 7500 | 0.6867          | -0.0475        | -0.0616          | 0.5918             | 0.0142          | -69.3426       | -63.4585     | -3.0589         | -3.0647       |
| 0.6789        | 2.6189 | 7600 | 0.6868          | -0.0477        | -0.0618          | 0.5915             | 0.0141          | -69.3578       | -63.4788     | -3.0584         | -3.0642       |
| 0.6768        | 2.6533 | 7700 | 0.6867          | -0.0475        | -0.0618          | 0.5946             | 0.0144          | -69.3650       | -63.4617     | -3.0582         | -3.0640       |
| 0.6808        | 2.6878 | 7800 | 0.6867          | -0.0477        | -0.0619          | 0.5918             | 0.0142          | -69.3712       | -63.4863     | -3.0584         | -3.0642       |
| 0.6782        | 2.7223 | 7900 | 0.6867          | -0.0478        | -0.0621          | 0.5925             | 0.0143          | -69.3874       | -63.4902     | -3.0581         | -3.0639       |
| 0.6794        | 2.7567 | 8000 | 0.6867          | -0.0479        | -0.0621          | 0.5897             | 0.0142          | -69.3922       | -63.5035     | -3.0580         | -3.0638       |
| 0.674         | 2.7912 | 8100 | 0.6867          | -0.0479        | -0.0621          | 0.5911             | 0.0142          | -69.3883       | -63.4992     | -3.0580         | -3.0638       |
| 0.6766        | 2.8256 | 8200 | 0.6866          | -0.0478        | -0.0622          | 0.5899             | 0.0144          | -69.4003       | -63.4938     | -3.0581         | -3.0639       |
| 0.6821        | 2.8601 | 8300 | 0.6867          | -0.0479        | -0.0622          | 0.5890             | 0.0143          | -69.3970       | -63.4998     | -3.0579         | -3.0637       |
| 0.6795        | 2.8946 | 8400 | 0.6867          | -0.0478        | -0.0621          | 0.5904             | 0.0142          | -69.3868       | -63.4954     | -3.0580         | -3.0637       |
| 0.679         | 2.9290 | 8500 | 0.6867          | -0.0479        | -0.0622          | 0.5925             | 0.0143          | -69.3981       | -63.4995     | -3.0579         | -3.0637       |
| 0.6816        | 2.9635 | 8600 | 0.6867          | -0.0478        | -0.0621          | 0.5922             | 0.0144          | -69.3946       | -63.4907     | -3.0579         | -3.0637       |
| 0.6751        | 2.9979 | 8700 | 0.6867          | -0.0478        | -0.0620          | 0.5936             | 0.0142          | -69.3779       | -63.4876     | -3.0580         | -3.0637       |


### Framework versions

- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1