File size: 11,411 Bytes
89cf04c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
license: cc-by-nc-4.0
base_model: davidberenstein1957/ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter0
tags:
- alignment-handbook
- generated_from_trainer
datasets:
- davidberenstein1957/ultra-feedback-dutch-cleaned-hq_iter0
- davidberenstein1957/ultra-feedback-dutch-cleaned-hq_iter1
model-index:
- name: outputs
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# outputs

This model is a fine-tuned version of [davidberenstein1957/ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter0](https://huggingface.co/davidberenstein1957/ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter0) on the davidberenstein1957/ultra-feedback-dutch-cleaned-hq_iter0 and the davidberenstein1957/ultra-feedback-dutch-cleaned-hq_iter1 datasets.
It achieves the following results on the evaluation set:
- Loss: 0.0380
- Rewards/real: -5.1867
- Rewards/generated: -23.6116
- Rewards/accuracies: 0.9778
- Rewards/margins: 18.4250
- Logps/generated: -690.4515
- Logps/real: -469.2089
- Logits/generated: -1.6815
- Logits/real: -2.1280

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/real | Rewards/generated | Rewards/accuracies | Rewards/margins | Logps/generated | Logps/real | Logits/generated | Logits/real |
|:-------------:|:-----:|:----:|:---------------:|:------------:|:-----------------:|:------------------:|:---------------:|:---------------:|:----------:|:----------------:|:-----------:|
| 0.591         | 0.04  | 25   | 0.4210          | -0.2501      | -1.0788           | 0.8500             | 0.8287          | -465.1227       | -419.8426  | -2.6984          | -2.7096     |
| 0.2223        | 0.08  | 50   | 0.2173          | -0.5659      | -3.0876           | 0.9176             | 2.5217          | -485.2113       | -423.0011  | -2.6306          | -2.6446     |
| 0.168         | 0.12  | 75   | 0.1532          | -0.7060      | -4.4771           | 0.9435             | 3.7711          | -499.1060       | -424.4022  | -2.5832          | -2.6005     |
| 0.1126        | 0.16  | 100  | 0.1218          | -1.2746      | -6.3162           | 0.9509             | 5.0415          | -517.4969       | -430.0886  | -2.5961          | -2.6118     |
| 0.0854        | 0.21  | 125  | 0.0921          | -1.7944      | -9.0378           | 0.9611             | 7.2433          | -544.7130       | -435.2866  | -2.5534          | -2.5859     |
| 0.0609        | 0.25  | 150  | 0.0738          | -1.6860      | -9.1926           | 0.9639             | 7.5065          | -546.2610       | -434.2025  | -2.5875          | -2.6239     |
| 0.0654        | 0.29  | 175  | 0.0733          | -2.0360      | -9.8189           | 0.9648             | 7.7828          | -552.5237       | -437.7025  | -2.5252          | -2.5698     |
| 0.0814        | 0.33  | 200  | 0.0714          | -2.3341      | -10.2294          | 0.9630             | 7.8952          | -556.6287       | -440.6832  | -2.4634          | -2.5260     |
| 0.0356        | 0.37  | 225  | 0.0698          | -2.6697      | -11.4164          | 0.9667             | 8.7467          | -568.4990       | -444.0394  | -2.4311          | -2.5142     |
| 0.0641        | 0.41  | 250  | 0.0586          | -2.3926      | -12.3053          | 0.9694             | 9.9126          | -577.3877       | -441.2684  | -2.3106          | -2.4202     |
| 0.0442        | 0.45  | 275  | 0.0672          | -2.5170      | -11.9462          | 0.9676             | 9.4293          | -573.7975       | -442.5117  | -2.3880          | -2.4773     |
| 0.0707        | 0.49  | 300  | 0.0540          | -3.8488      | -15.1469          | 0.9667             | 11.2982         | -605.8044       | -455.8299  | -2.2564          | -2.3913     |
| 0.0683        | 0.53  | 325  | 0.0574          | -5.2977      | -18.2377          | 0.9667             | 12.9400         | -636.7123       | -470.3190  | -2.1402          | -2.3222     |
| 0.0339        | 0.58  | 350  | 0.0495          | -3.7486      | -17.2926          | 0.9731             | 13.5439         | -627.2608       | -454.8286  | -2.1701          | -2.3731     |
| 0.0648        | 0.62  | 375  | 0.0537          | -2.4302      | -13.2604          | 0.9722             | 10.8301         | -586.9390       | -441.6444  | -2.3167          | -2.4783     |
| 0.0358        | 0.66  | 400  | 0.0460          | -3.8509      | -17.3389          | 0.9741             | 13.4880         | -627.7241       | -455.8509  | -2.1735          | -2.3874     |
| 0.0532        | 0.7   | 425  | 0.0483          | -4.3261      | -18.2030          | 0.9741             | 13.8769         | -636.3655       | -460.6029  | -2.1550          | -2.3751     |
| 0.0408        | 0.74  | 450  | 0.0567          | -4.8885      | -19.7272          | 0.9741             | 14.8387         | -651.6073       | -466.2276  | -2.2982          | -2.4811     |
| 0.0434        | 0.78  | 475  | 0.0467          | -2.8677      | -16.1120          | 0.9731             | 13.2443         | -615.4548       | -446.0187  | -2.1937          | -2.4242     |
| 0.0194        | 0.82  | 500  | 0.0455          | -3.2473      | -18.4707          | 0.9769             | 15.2234         | -639.0422       | -449.8151  | -2.0107          | -2.3291     |
| 0.0227        | 0.86  | 525  | 0.0543          | -4.5805      | -20.1131          | 0.9750             | 15.5326         | -655.4664       | -463.1471  | -2.2146          | -2.4100     |
| 0.0299        | 0.91  | 550  | 0.0481          | -4.3021      | -20.3869          | 0.9731             | 16.0848         | -658.2037       | -460.3627  | -2.0552          | -2.3301     |
| 0.0218        | 0.95  | 575  | 0.0464          | -4.4619      | -20.3587          | 0.9713             | 15.8967         | -657.9220       | -461.9616  | -1.9225          | -2.2635     |
| 0.0218        | 0.99  | 600  | 0.0451          | -5.3210      | -20.9811          | 0.9722             | 15.6602         | -664.1465       | -470.5517  | -1.9518          | -2.2964     |
| 0.0093        | 1.03  | 625  | 0.0429          | -4.3395      | -19.2716          | 0.9750             | 14.9321         | -647.0515       | -460.7374  | -1.7575          | -2.1708     |
| 0.0173        | 1.07  | 650  | 0.0492          | -4.1317      | -19.0745          | 0.9704             | 14.9428         | -645.0802       | -458.6593  | -1.8155          | -2.1757     |
| 0.0059        | 1.11  | 675  | 0.0449          | -5.7336      | -23.1577          | 0.9713             | 17.4241         | -685.9126       | -474.6784  | -1.6844          | -2.1123     |
| 0.0149        | 1.15  | 700  | 0.0608          | -7.1484      | -26.1989          | 0.9713             | 19.0504         | -716.3237       | -488.8266  | -2.0142          | -2.2748     |
| 0.0105        | 1.19  | 725  | 0.0479          | -4.4948      | -20.2513          | 0.9722             | 15.7564         | -656.8477       | -462.2903  | -2.1674          | -2.3962     |
| 0.032         | 1.23  | 750  | 0.0512          | -5.0950      | -21.3230          | 0.9685             | 16.2280         | -667.5649       | -468.2917  | -2.2426          | -2.4414     |
| 0.0042        | 1.28  | 775  | 0.0462          | -4.0296      | -19.2620          | 0.9704             | 15.2324         | -646.9548       | -457.6381  | -2.2156          | -2.4379     |
| 0.0041        | 1.32  | 800  | 0.0475          | -4.0348      | -19.8410          | 0.9731             | 15.8062         | -652.7453       | -457.6903  | -2.1330          | -2.3843     |
| 0.0075        | 1.36  | 825  | 0.0428          | -4.4696      | -20.8584          | 0.9722             | 16.3888         | -662.9192       | -462.0378  | -2.1122          | -2.3718     |
| 0.004         | 1.4   | 850  | 0.0468          | -6.2822      | -25.6273          | 0.9750             | 19.3451         | -710.6078       | -480.1642  | -1.7240          | -2.1709     |
| 0.0222        | 1.44  | 875  | 0.0584          | -6.0399      | -23.0778          | 0.9759             | 17.0379         | -685.1132       | -477.7408  | -1.6544          | -2.1242     |
| 0.0063        | 1.48  | 900  | 0.0490          | -3.8721      | -19.8020          | 0.9722             | 15.9298         | -652.3550       | -456.0635  | -1.7696          | -2.2026     |
| 0.006         | 1.52  | 925  | 0.0478          | -5.2822      | -23.7504          | 0.9750             | 18.4682         | -691.8392       | -470.1639  | -1.6461          | -2.1239     |
| 0.0169        | 1.56  | 950  | 0.0455          | -4.9375      | -22.9431          | 0.9731             | 18.0057         | -683.7665       | -466.7169  | -1.6890          | -2.1447     |
| 0.0063        | 1.6   | 975  | 0.0449          | -5.9782      | -25.0564          | 0.9741             | 19.0782         | -704.8994       | -477.1242  | -1.5890          | -2.0779     |
| 0.0144        | 1.65  | 1000 | 0.0428          | -5.2622      | -22.9304          | 0.9731             | 17.6682         | -683.6391       | -469.9639  | -1.6262          | -2.0859     |
| 0.0046        | 1.69  | 1025 | 0.0411          | -5.5146      | -24.0845          | 0.9759             | 18.5698         | -695.1800       | -472.4886  | -1.6070          | -2.0934     |
| 0.002         | 1.73  | 1050 | 0.0408          | -5.4174      | -23.7610          | 0.9750             | 18.3436         | -691.9457       | -471.5163  | -1.6779          | -2.1277     |
| 0.0047        | 1.77  | 1075 | 0.0411          | -5.6837      | -24.5512          | 0.9750             | 18.8674         | -699.8467       | -474.1796  | -1.7048          | -2.1412     |
| 0.0077        | 1.81  | 1100 | 0.0404          | -5.8712      | -25.3478          | 0.9759             | 19.4766         | -707.8129       | -476.0543  | -1.6257          | -2.0917     |
| 0.0145        | 1.85  | 1125 | 0.0385          | -5.0758      | -23.2450          | 0.9741             | 18.1692         | -686.7853       | -468.0999  | -1.6509          | -2.1029     |
| 0.0038        | 1.89  | 1150 | 0.0376          | -5.2077      | -23.5236          | 0.9759             | 18.3159         | -689.5715       | -469.4194  | -1.6736          | -2.1249     |
| 0.01          | 1.93  | 1175 | 0.0379          | -5.1247      | -23.3484          | 0.9750             | 18.2238         | -687.8193       | -468.5888  | -1.6969          | -2.1383     |
| 0.0055        | 1.98  | 1200 | 0.0380          | -5.1867      | -23.6116          | 0.9778             | 18.4250         | -690.4515       | -469.2089  | -1.6815          | -2.1280     |


### Framework versions

- Transformers 4.37.0
- Pytorch 2.1.2+cu121
- Datasets 2.14.6
- Tokenizers 0.15.2