File size: 10,457 Bytes
16f15e0
 
 
 
ddb3066
16f15e0
 
 
ddb3066
 
 
 
 
16f15e0
 
 
 
 
 
 
 
 
 
ddb3066
16f15e0
ddb3066
 
 
 
 
 
 
 
 
16f15e0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-sum-sft-full
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- openai/summarize_from_feedback
model-index:
- name: tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs

This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full) on the openai/summarize_from_feedback dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6411
- Rewards/chosen: -1.5955
- Rewards/rejected: -1.9066
- Rewards/accuracies: 0.6273
- Rewards/margins: 0.3112
- Logps/rejected: -253.4108
- Logps/chosen: -218.5612
- Logits/rejected: -2.1502
- Logits/chosen: -2.1697

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6924        | 0.0689 | 400   | 0.6930          | 0.0011         | 0.0007           | 0.5390             | 0.0003          | -62.6755       | -58.9094     | -2.9687         | -2.9723       |
| 0.6891        | 0.1378 | 800   | 0.6909          | -0.0061        | -0.0108          | 0.5748             | 0.0047          | -63.8305       | -59.6239     | -2.9588         | -2.9622       |
| 0.6874        | 0.2068 | 1200  | 0.6876          | -0.0302        | -0.0427          | 0.5871             | 0.0124          | -67.0173       | -62.0385     | -2.9361         | -2.9395       |
| 0.676         | 0.2757 | 1600  | 0.6820          | -0.1057        | -0.1316          | 0.5850             | 0.0259          | -75.9065       | -69.5813     | -2.8942         | -2.8976       |
| 0.6751        | 0.3446 | 2000  | 0.6770          | -0.1715        | -0.2098          | 0.5890             | 0.0384          | -83.7308       | -76.1611     | -2.8434         | -2.8468       |
| 0.6518        | 0.4135 | 2400  | 0.6676          | -0.3727        | -0.4381          | 0.6069             | 0.0654          | -106.5637      | -96.2904     | -2.7893         | -2.7926       |
| 0.6695        | 0.4824 | 2800  | 0.6631          | -0.4734        | -0.5560          | 0.6141             | 0.0826          | -118.3500      | -106.3523    | -2.7415         | -2.7450       |
| 0.6467        | 0.5513 | 3200  | 0.6583          | -0.6700        | -0.7814          | 0.625              | 0.1113          | -140.8851      | -126.0199    | -2.6864         | -2.6902       |
| 0.6264        | 0.6203 | 3600  | 0.6586          | -0.6359        | -0.7384          | 0.6106             | 0.1024          | -136.5857      | -122.6100    | -2.6176         | -2.6225       |
| 0.6203        | 0.6892 | 4000  | 0.6523          | -0.7851        | -0.9183          | 0.6166             | 0.1332          | -154.5775      | -137.5248    | -2.5583         | -2.5642       |
| 0.6341        | 0.7581 | 4400  | 0.6487          | -0.8786        | -1.0259          | 0.6129             | 0.1473          | -165.3377      | -146.8752    | -2.4643         | -2.4723       |
| 0.6184        | 0.8270 | 4800  | 0.6454          | -1.0766        | -1.2481          | 0.6129             | 0.1716          | -187.5630      | -166.6730    | -2.4141         | -2.4242       |
| 0.609         | 0.8959 | 5200  | 0.6414          | -0.9919        | -1.1678          | 0.6164             | 0.1759          | -179.5278      | -158.2066    | -2.3970         | -2.4080       |
| 0.5977        | 0.9649 | 5600  | 0.6432          | -0.9166        | -1.0804          | 0.6273             | 0.1638          | -170.7888      | -150.6710    | -2.3933         | -2.4042       |
| 0.5845        | 1.0338 | 6000  | 0.6438          | -1.3686        | -1.6032          | 0.6245             | 0.2346          | -223.0724      | -195.8758    | -2.2640         | -2.2816       |
| 0.5789        | 1.1027 | 6400  | 0.6455          | -1.3882        | -1.6212          | 0.6164             | 0.2331          | -224.8725      | -197.8306    | -2.2428         | -2.2595       |
| 0.5681        | 1.1716 | 6800  | 0.6434          | -1.3348        | -1.5500          | 0.6129             | 0.2153          | -217.7540      | -192.4917    | -2.2435         | -2.2593       |
| 0.5602        | 1.2405 | 7200  | 0.6448          | -1.3673        | -1.5959          | 0.6234             | 0.2286          | -222.3391      | -195.7428    | -2.2210         | -2.2378       |
| 0.6357        | 1.3094 | 7600  | 0.6413          | -1.3975        | -1.6344          | 0.6125             | 0.2368          | -226.1876      | -198.7702    | -2.2034         | -2.2208       |
| 0.5491        | 1.3784 | 8000  | 0.6438          | -1.4655        | -1.7121          | 0.6055             | 0.2466          | -233.9599      | -205.5657    | -2.1906         | -2.2085       |
| 0.5537        | 1.4473 | 8400  | 0.6445          | -1.4375        | -1.6793          | 0.6259             | 0.2418          | -230.6812      | -202.7634    | -2.1797         | -2.1984       |
| 0.61          | 1.5162 | 8800  | 0.6405          | -1.0941        | -1.2946          | 0.6164             | 0.2005          | -192.2120      | -168.4266    | -2.2428         | -2.2579       |
| 0.523         | 1.5851 | 9200  | 0.6431          | -1.4596        | -1.7029          | 0.6289             | 0.2433          | -233.0398      | -204.9723    | -2.1570         | -2.1756       |
| 0.5412        | 1.6540 | 9600  | 0.6393          | -1.4228        | -1.6896          | 0.6315             | 0.2668          | -231.7097      | -201.2986    | -2.1513         | -2.1708       |
| 0.5368        | 1.7229 | 10000 | 0.6408          | -1.3358        | -1.5858          | 0.6236             | 0.2500          | -221.3330      | -192.5947    | -2.1730         | -2.1915       |
| 0.5064        | 1.7919 | 10400 | 0.6423          | -1.0625        | -1.2620          | 0.6215             | 0.1995          | -188.9488      | -165.2631    | -2.2150         | -2.2307       |
| 0.5268        | 1.8608 | 10800 | 0.6406          | -1.4254        | -1.6829          | 0.6341             | 0.2575          | -231.0404      | -201.5558    | -2.1644         | -2.1831       |
| 0.5384        | 1.9297 | 11200 | 0.6418          | -1.6486        | -1.9439          | 0.6364             | 0.2954          | -257.1440      | -223.8720    | -2.1299         | -2.1503       |
| 0.5734        | 1.9986 | 11600 | 0.6378          | -1.4356        | -1.7101          | 0.6362             | 0.2744          | -233.7563      | -202.5782    | -2.1624         | -2.1813       |
| 0.5302        | 2.0675 | 12000 | 0.6413          | -1.7064        | -2.0285          | 0.6292             | 0.3221          | -265.5970      | -229.6515    | -2.1257         | -2.1466       |
| 0.4961        | 2.1365 | 12400 | 0.6474          | -2.0075        | -2.3712          | 0.6387             | 0.3637          | -299.8690      | -259.7696    | -2.0958         | -2.1178       |
| 0.55          | 2.2054 | 12800 | 0.6415          | -1.5035        | -1.7868          | 0.6315             | 0.2833          | -241.4328      | -209.3660    | -2.1574         | -2.1761       |
| 0.5546        | 2.2743 | 13200 | 0.6425          | -1.6715        | -1.9874          | 0.6303             | 0.3159          | -261.4859      | -226.1615    | -2.1413         | -2.1612       |
| 0.5639        | 2.3432 | 13600 | 0.6409          | -1.5908        | -1.8980          | 0.6289             | 0.3072          | -252.5519      | -218.1001    | -2.1481         | -2.1675       |
| 0.5055        | 2.4121 | 14000 | 0.6384          | -1.4618        | -1.7629          | 0.6257             | 0.3010          | -239.0347      | -205.1979    | -2.1665         | -2.1857       |
| 0.5404        | 2.4810 | 14400 | 0.6405          | -1.6514        | -1.9790          | 0.6285             | 0.3276          | -260.6489      | -224.1589    | -2.1411         | -2.1613       |
| 0.5348        | 2.5500 | 14800 | 0.6418          | -1.6812        | -2.0090          | 0.6276             | 0.3278          | -263.6481      | -227.1385    | -2.1375         | -2.1578       |
| 0.5114        | 2.6189 | 15200 | 0.6408          | -1.5587        | -1.8632          | 0.6310             | 0.3046          | -249.0734      | -214.8810    | -2.1538         | -2.1732       |
| 0.5356        | 2.6878 | 15600 | 0.6405          | -1.5493        | -1.8534          | 0.6266             | 0.3041          | -248.0918      | -213.9473    | -2.1550         | -2.1743       |
| 0.4885        | 2.7567 | 16000 | 0.6406          | -1.5822        | -1.8916          | 0.6269             | 0.3094          | -251.9056      | -217.2328    | -2.1512         | -2.1707       |
| 0.5057        | 2.8256 | 16400 | 0.6410          | -1.5799        | -1.8883          | 0.6306             | 0.3084          | -251.5751      | -217.0051    | -2.1527         | -2.1720       |
| 0.5731        | 2.8946 | 16800 | 0.6412          | -1.5917        | -1.9021          | 0.6271             | 0.3104          | -252.9564      | -218.1854    | -2.1507         | -2.1702       |
| 0.4958        | 2.9635 | 17200 | 0.6412          | -1.5933        | -1.9040          | 0.6296             | 0.3107          | -253.1478      | -218.3473    | -2.1506         | -2.1702       |


### Framework versions

- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1