File size: 10,477 Bytes
12d6811
 
 
 
b047394
12d6811
 
 
b047394
 
 
 
 
12d6811
 
 
 
 
 
 
 
 
 
b047394
12d6811
b047394
 
 
 
 
 
 
 
 
12d6811
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-sum-sft-full_old
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- openai/summarize_from_feedback
model-index:
- name: tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama-1.1b-sum-dpo-full_LR2e-7_3epochs_old

This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-sum-sft-full_old](https://huggingface.co/martimfasantos/tinyllama-1.1b-sum-sft-full_old) on the openai/summarize_from_feedback dataset.
It achieves the following results on the evaluation set:
- Loss: 0.6307
- Rewards/chosen: -1.4504
- Rewards/rejected: -1.8097
- Rewards/accuracies: 0.6434
- Rewards/margins: 0.3593
- Logps/rejected: -244.1550
- Logps/chosen: -203.7530
- Logits/rejected: -1.7026
- Logits/chosen: -1.7263

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:-----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6931        | 0.0689 | 400   | 0.6932          | 0.0002         | 0.0003           | 0.4654             | -0.0001         | -63.1542       | -58.6924     | -3.1574         | -3.1630       |
| 0.692         | 0.1378 | 800   | 0.6928          | 0.0015         | 0.0008           | 0.5525             | 0.0007          | -63.0955       | -58.5586     | -3.1518         | -3.1574       |
| 0.6902        | 0.2068 | 1200  | 0.6914          | 0.0009         | -0.0027          | 0.5876             | 0.0037          | -63.4527       | -58.6187     | -3.1281         | -3.1338       |
| 0.6835        | 0.2757 | 1600  | 0.6888          | -0.0225        | -0.0320          | 0.5864             | 0.0096          | -66.3833       | -60.9598     | -3.0838         | -3.0895       |
| 0.6778        | 0.3446 | 2000  | 0.6845          | -0.0724        | -0.0918          | 0.5976             | 0.0194          | -72.3574       | -65.9486     | -3.0213         | -3.0270       |
| 0.6688        | 0.4135 | 2400  | 0.6792          | -0.1403        | -0.1725          | 0.6032             | 0.0323          | -80.4345       | -72.7375     | -2.9370         | -2.9428       |
| 0.6675        | 0.4824 | 2800  | 0.6732          | -0.2283        | -0.2756          | 0.6057             | 0.0472          | -90.7353       | -81.5436     | -2.8576         | -2.8635       |
| 0.6437        | 0.5513 | 3200  | 0.6646          | -0.3557        | -0.4265          | 0.6120             | 0.0708          | -105.8322      | -94.2796     | -2.7546         | -2.7607       |
| 0.6516        | 0.6203 | 3600  | 0.6602          | -0.4125        | -0.4982          | 0.6178             | 0.0856          | -112.9954      | -99.9643     | -2.6547         | -2.6612       |
| 0.6264        | 0.6892 | 4000  | 0.6514          | -0.5858        | -0.7050          | 0.6315             | 0.1192          | -133.6785      | -117.2944    | -2.5252         | -2.5324       |
| 0.6109        | 0.7581 | 4400  | 0.6474          | -0.6217        | -0.7587          | 0.6313             | 0.1370          | -139.0484      | -120.8850    | -2.4041         | -2.4124       |
| 0.6153        | 0.8270 | 4800  | 0.6432          | -0.7112        | -0.8720          | 0.6266             | 0.1608          | -150.3814      | -129.8305    | -2.3206         | -2.3302       |
| 0.6107        | 0.8959 | 5200  | 0.6407          | -0.7470        | -0.9249          | 0.6350             | 0.1779          | -155.6741      | -133.4166    | -2.2363         | -2.2476       |
| 0.6061        | 0.9649 | 5600  | 0.6392          | -0.7851        | -0.9723          | 0.6315             | 0.1871          | -160.4070      | -137.2255    | -2.1733         | -2.1859       |
| 0.5701        | 1.0338 | 6000  | 0.6356          | -1.0035        | -1.2450          | 0.6292             | 0.2415          | -187.6758      | -159.0581    | -2.0122         | -2.0292       |
| 0.5557        | 1.1027 | 6400  | 0.6358          | -1.0296        | -1.2785          | 0.6322             | 0.2489          | -191.0262      | -161.6682    | -1.9777         | -1.9953       |
| 0.5292        | 1.1716 | 6800  | 0.6333          | -1.0878        | -1.3492          | 0.6313             | 0.2614          | -198.1001      | -167.4900    | -1.8969         | -1.9159       |
| 0.5473        | 1.2405 | 7200  | 0.6354          | -1.0479        | -1.2958          | 0.6262             | 0.2479          | -192.7597      | -163.5001    | -1.9044         | -1.9226       |
| 0.6231        | 1.3094 | 7600  | 0.6346          | -1.2184        | -1.4979          | 0.6289             | 0.2795          | -212.9705      | -180.5535    | -1.8355         | -1.8558       |
| 0.5403        | 1.3784 | 8000  | 0.6339          | -1.1437        | -1.4111          | 0.6264             | 0.2673          | -204.2867      | -173.0842    | -1.8647         | -1.8848       |
| 0.5444        | 1.4473 | 8400  | 0.6339          | -1.0726        | -1.3310          | 0.6287             | 0.2584          | -196.2827      | -165.9765    | -1.8568         | -1.8768       |
| 0.5766        | 1.5162 | 8800  | 0.6329          | -1.0364        | -1.2879          | 0.6336             | 0.2516          | -191.9749      | -162.3483    | -1.8819         | -1.9009       |
| 0.525         | 1.5851 | 9200  | 0.6320          | -1.1870        | -1.4611          | 0.6366             | 0.2740          | -209.2869      | -177.4161    | -1.8122         | -1.8325       |
| 0.5174        | 1.6540 | 9600  | 0.6310          | -1.2662        | -1.5606          | 0.6375             | 0.2944          | -219.2438      | -185.3348    | -1.7597         | -1.7810       |
| 0.5312        | 1.7229 | 10000 | 0.6313          | -1.2979        | -1.6013          | 0.6359             | 0.3033          | -223.3081      | -188.5056    | -1.7629         | -1.7848       |
| 0.4923        | 1.7919 | 10400 | 0.6312          | -1.1596        | -1.4412          | 0.6334             | 0.2815          | -207.2955      | -174.6746    | -1.7754         | -1.7966       |
| 0.5386        | 1.8608 | 10800 | 0.6304          | -1.2706        | -1.5735          | 0.6373             | 0.3029          | -220.5279      | -185.7685    | -1.7500         | -1.7722       |
| 0.5178        | 1.9297 | 11200 | 0.6295          | -1.2859        | -1.6008          | 0.6443             | 0.3149          | -223.2599      | -187.3036    | -1.7272         | -1.7501       |
| 0.5556        | 1.9986 | 11600 | 0.6295          | -1.2652        | -1.5714          | 0.6362             | 0.3062          | -220.3214      | -185.2294    | -1.7356         | -1.7580       |
| 0.4901        | 2.0675 | 12000 | 0.6303          | -1.4749        | -1.8246          | 0.6447             | 0.3497          | -245.6420      | -206.2009    | -1.6688         | -1.6928       |
| 0.4713        | 2.1365 | 12400 | 0.6303          | -1.6230        | -2.0017          | 0.6471             | 0.3786          | -263.3478      | -221.0147    | -1.6397         | -1.6644       |
| 0.5188        | 2.2054 | 12800 | 0.6305          | -1.4593        | -1.8052          | 0.6408             | 0.3458          | -243.6979      | -204.6454    | -1.6776         | -1.7011       |
| 0.5395        | 2.2743 | 13200 | 0.6315          | -1.5373        | -1.9051          | 0.6429             | 0.3678          | -253.6892      | -212.4377    | -1.6591         | -1.6834       |
| 0.5059        | 2.3432 | 13600 | 0.6318          | -1.4799        | -1.8381          | 0.6431             | 0.3582          | -246.9884      | -206.6992    | -1.6812         | -1.7051       |
| 0.4543        | 2.4121 | 14000 | 0.6318          | -1.3717        | -1.7109          | 0.6459             | 0.3392          | -234.2693      | -195.8793    | -1.7134         | -1.7366       |
| 0.5121        | 2.4810 | 14400 | 0.6308          | -1.4206        | -1.7736          | 0.6447             | 0.3530          | -240.5389      | -200.7700    | -1.7016         | -1.7252       |
| 0.4847        | 2.5500 | 14800 | 0.6304          | -1.4817        | -1.8498          | 0.6443             | 0.3681          | -248.1589      | -206.8796    | -1.6912         | -1.7153       |
| 0.4701        | 2.6189 | 15200 | 0.6306          | -1.4145        | -1.7659          | 0.6445             | 0.3514          | -239.7732      | -200.1665    | -1.7090         | -1.7324       |
| 0.5011        | 2.6878 | 15600 | 0.6304          | -1.4080        | -1.7575          | 0.6434             | 0.3495          | -238.9349      | -199.5119    | -1.7135         | -1.7369       |
| 0.4936        | 2.7567 | 16000 | 0.6304          | -1.4490        | -1.8088          | 0.6436             | 0.3598          | -244.0595      | -203.6143    | -1.7010         | -1.7248       |
| 0.4952        | 2.8256 | 16400 | 0.6312          | -1.4483        | -1.8060          | 0.6438             | 0.3577          | -243.7794      | -203.5389    | -1.7043         | -1.7279       |
| 0.5024        | 2.8946 | 16800 | 0.6304          | -1.4492        | -1.8094          | 0.6429             | 0.3602          | -244.1201      | -203.6308    | -1.7037         | -1.7274       |
| 0.5054        | 2.9635 | 17200 | 0.6303          | -1.4484        | -1.8080          | 0.6436             | 0.3596          | -243.9776      | -203.5508    | -1.7024         | -1.7262       |


### Framework versions

- Transformers 4.41.2
- Pytorch 2.1.2
- Datasets 2.19.2
- Tokenizers 0.19.1