File size: 9,436 Bytes
309f872
 
 
 
3752aff
309f872
 
 
3752aff
 
 
 
 
309f872
 
 
 
 
 
 
 
 
 
3752aff
309f872
3752aff
 
 
 
 
 
 
 
 
309f872
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-chat-sft-full
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: tinyllama-1.1b-chat-dpo-full
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama-1.1b-chat-dpo-full

This model is a fine-tuned version of [martimfasantos/tinyllama-1.1b-chat-sft-full](https://huggingface.co/martimfasantos/tinyllama-1.1b-chat-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5860
- Rewards/chosen: -1.1602
- Rewards/rejected: -1.6135
- Rewards/accuracies: 0.6890
- Rewards/margins: 0.4533
- Logps/rejected: -458.4552
- Logps/chosen: -452.2377
- Logits/rejected: -2.3877
- Logits/chosen: -2.4300

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.693         | 0.0262 | 100  | 0.6929          | -0.0014        | -0.0019          | 0.5320             | 0.0006          | -297.2994      | -336.3557    | -3.1228         | -3.1361       |
| 0.6887        | 0.0523 | 200  | 0.6892          | -0.0302        | -0.0383          | 0.6160             | 0.0081          | -300.9348      | -339.2341    | -3.1215         | -3.1346       |
| 0.6789        | 0.0785 | 300  | 0.6794          | -0.0789        | -0.1087          | 0.6360             | 0.0299          | -307.9798      | -344.1051    | -3.1094         | -3.1216       |
| 0.6624        | 0.1047 | 400  | 0.6635          | -0.1807        | -0.2518          | 0.6390             | 0.0711          | -322.2854      | -354.2890    | -3.0664         | -3.0771       |
| 0.6373        | 0.1309 | 500  | 0.6503          | -0.2988        | -0.4120          | 0.6425             | 0.1133          | -338.3080      | -366.0959    | -2.9693         | -2.9839       |
| 0.6423        | 0.1570 | 600  | 0.6457          | -0.3891        | -0.5345          | 0.6375             | 0.1454          | -350.5518      | -375.1291    | -2.9372         | -2.9538       |
| 0.6266        | 0.1832 | 700  | 0.6420          | -0.7030        | -0.9081          | 0.6365             | 0.2051          | -387.9123      | -406.5211    | -2.9095         | -2.9229       |
| 0.5942        | 0.2094 | 800  | 0.6367          | -0.4969        | -0.6764          | 0.6475             | 0.1795          | -364.7484      | -385.9118    | -2.9255         | -2.9397       |
| 0.6171        | 0.2355 | 900  | 0.6330          | -0.5389        | -0.7443          | 0.6545             | 0.2054          | -371.5351      | -390.1065    | -2.8815         | -2.8992       |
| 0.6156        | 0.2617 | 1000 | 0.6271          | -0.9278        | -1.1788          | 0.6460             | 0.2510          | -414.9855      | -428.9975    | -2.8469         | -2.8665       |
| 0.6636        | 0.2879 | 1100 | 0.6234          | -0.7984        | -1.0304          | 0.6515             | 0.2320          | -400.1489      | -416.0618    | -2.8144         | -2.8347       |
| 0.6832        | 0.3141 | 1200 | 0.6152          | -1.0303        | -1.3170          | 0.6570             | 0.2866          | -428.8004      | -439.2536    | -2.7994         | -2.8212       |
| 0.5967        | 0.3402 | 1300 | 0.6131          | -1.2342        | -1.5321          | 0.6655             | 0.2979          | -450.3198      | -459.6400    | -2.7494         | -2.7756       |
| 0.596         | 0.3664 | 1400 | 0.6064          | -0.8587        | -1.1697          | 0.6820             | 0.3110          | -414.0766      | -422.0903    | -2.8084         | -2.8289       |
| 0.592         | 0.3926 | 1500 | 0.6027          | -0.9689        | -1.3189          | 0.6715             | 0.3499          | -428.9929      | -433.1132    | -2.7455         | -2.7703       |
| 0.6353        | 0.4187 | 1600 | 0.6051          | -0.9640        | -1.3223          | 0.6745             | 0.3582          | -429.3314      | -432.6226    | -2.6972         | -2.7245       |
| 0.6603        | 0.4449 | 1700 | 0.6016          | -0.9893        | -1.3221          | 0.6765             | 0.3328          | -429.3145      | -435.1521    | -2.7021         | -2.7305       |
| 0.5551        | 0.4711 | 1800 | 0.6023          | -1.0035        | -1.3765          | 0.6790             | 0.3731          | -434.7590      | -436.5641    | -2.6159         | -2.6492       |
| 0.5877        | 0.4973 | 1900 | 0.5975          | -0.8137        | -1.1853          | 0.6835             | 0.3716          | -415.6308      | -417.5872    | -2.6621         | -2.6941       |
| 0.5827        | 0.5234 | 2000 | 0.5935          | -0.8724        | -1.2562          | 0.6810             | 0.3838          | -422.7221      | -423.4575    | -2.6043         | -2.6396       |
| 0.6017        | 0.5496 | 2100 | 0.5911          | -1.0065        | -1.3971          | 0.6905             | 0.3907          | -436.8172      | -436.8658    | -2.6105         | -2.6436       |
| 0.5539        | 0.5758 | 2200 | 0.5920          | -0.9060        | -1.2945          | 0.6885             | 0.3884          | -426.5499      | -426.8195    | -2.5724         | -2.6076       |
| 0.5795        | 0.6019 | 2300 | 0.5914          | -1.1164        | -1.5398          | 0.6865             | 0.4234          | -451.0841      | -447.8605    | -2.5399         | -2.5757       |
| 0.5657        | 0.6281 | 2400 | 0.5904          | -1.0347        | -1.4494          | 0.6860             | 0.4147          | -442.0414      | -439.6861    | -2.5121         | -2.5487       |
| 0.5306        | 0.6543 | 2500 | 0.5918          | -1.0464        | -1.4840          | 0.6825             | 0.4376          | -445.5005      | -440.8591    | -2.4692         | -2.5102       |
| 0.5762        | 0.6805 | 2600 | 0.5927          | -1.0687        | -1.5141          | 0.6780             | 0.4455          | -448.5193      | -443.0862    | -2.4291         | -2.4735       |
| 0.6016        | 0.7066 | 2700 | 0.5936          | -1.0767        | -1.5080          | 0.6800             | 0.4313          | -447.9063      | -443.8889    | -2.4329         | -2.4747       |
| 0.6068        | 0.7328 | 2800 | 0.5897          | -1.1905        | -1.6433          | 0.6820             | 0.4527          | -461.4312      | -455.2722    | -2.4294         | -2.4708       |
| 0.5821        | 0.7590 | 2900 | 0.5870          | -1.1245        | -1.5598          | 0.6845             | 0.4353          | -453.0833      | -448.6697    | -2.4470         | -2.4862       |
| 0.5393        | 0.7851 | 3000 | 0.5873          | -1.2223        | -1.6710          | 0.6870             | 0.4486          | -464.2020      | -458.4521    | -2.4161         | -2.4565       |
| 0.577         | 0.8113 | 3100 | 0.5886          | -1.1359        | -1.5757          | 0.6845             | 0.4399          | -454.6796      | -449.8056    | -2.4137         | -2.4538       |
| 0.5731        | 0.8375 | 3200 | 0.5864          | -1.1928        | -1.6493          | 0.6900             | 0.4564          | -462.0313      | -455.5009    | -2.3988         | -2.4401       |
| 0.586         | 0.8636 | 3300 | 0.5865          | -1.1740        | -1.6231          | 0.6895             | 0.4492          | -459.4178      | -453.6159    | -2.3969         | -2.4384       |
| 0.5629        | 0.8898 | 3400 | 0.5860          | -1.1573        | -1.6086          | 0.6890             | 0.4513          | -457.9694      | -451.9486    | -2.3882         | -2.4306       |
| 0.6059        | 0.9160 | 3500 | 0.5858          | -1.1672        | -1.6213          | 0.6890             | 0.4541          | -459.2307      | -452.9388    | -2.3897         | -2.4320       |
| 0.5703        | 0.9422 | 3600 | 0.5860          | -1.1607        | -1.6138          | 0.6870             | 0.4532          | -458.4890      | -452.2865    | -2.3897         | -2.4320       |
| 0.5533        | 0.9683 | 3700 | 0.5858          | -1.1623        | -1.6161          | 0.6880             | 0.4538          | -458.7165      | -452.4510    | -2.3882         | -2.4304       |
| 0.5988        | 0.9945 | 3800 | 0.5862          | -1.1608        | -1.6138          | 0.6885             | 0.4530          | -458.4823      | -452.2973    | -2.3882         | -2.4306       |


### Framework versions

- Transformers 4.41.1
- Pytorch 2.1.2
- Datasets 2.19.1
- Tokenizers 0.19.1