File size: 11,215 Bytes
9ca1811
 
 
 
c1fbc17
 
9ca1811
 
 
c1fbc17
 
9ca1811
 
 
 
 
 
 
 
 
 
c1fbc17
9ca1811
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
license: apache-2.0
base_model: ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs5
tags:
- alignment-handbook
- generated_from_trainer
- trl
- dpo
- generated_from_trainer
datasets:
- HuggingFaceH4/ultrafeedback_binarized
model-index:
- name: tinyllama_moe_dpo_ultrachat_v2_epochs5
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# tinyllama_moe_dpo_ultrachat_v2_epochs5

This model is a fine-tuned version of [ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs5](https://huggingface.co/ondevicellm/tinyllama_moe_sft_ultrachat200k_v2_epochs5) on the HuggingFaceH4/ultrafeedback_binarized dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5739
- Rewards/chosen: -1.1929
- Rewards/rejected: -1.7842
- Rewards/accuracies: 0.7163
- Rewards/margins: 0.5913
- Logps/rejected: -486.3180
- Logps/chosen: -468.6473
- Logits/rejected: -1.7313
- Logits/chosen: -1.8442

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 96
- num_epochs: 5

### Training results

| Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
|:-------------:|:-----:|:----:|:-------------:|:---------------:|:------------:|:--------------:|:---------------:|:------------------:|:--------------:|:---------------:|:----------------:|
| 0.6913        | 0.1   | 100  | -2.7889       | -2.7179         | -348.8463    | -307.7887      | 0.6915          | 0.6012             | 0.0051         | 0.0041          | 0.0011           |
| 0.6848        | 0.21  | 200  | -2.7786       | -2.7064         | -347.1148    | -307.7814      | 0.6844          | 0.6548             | 0.0224         | 0.0213          | 0.0011           |
| 0.6719        | 0.31  | 300  | -2.7564       | -2.6828         | -347.1926    | -310.3274      | 0.6745          | 0.6567             | 0.0217         | 0.0460          | -0.0243          |
| 0.6593        | 0.42  | 400  | -2.7168       | -2.6417         | -351.2079    | -317.7508      | 0.6626          | 0.6627             | -0.0185        | 0.0801          | -0.0985          |
| 0.6489        | 0.52  | 500  | -2.6766       | -2.5996         | -359.7169    | -330.5644      | 0.6503          | 0.6667             | -0.1036        | 0.1231          | -0.2267          |
| 0.6442        | 0.63  | 600  | -2.6209       | -2.5415         | -364.4345    | -339.3099      | 0.6407          | 0.6806             | -0.1507        | 0.1634          | -0.3141          |
| 0.6271        | 0.73  | 700  | -2.5658       | -2.4836         | -373.3324    | -352.5069      | 0.6321          | 0.6766             | -0.2397        | 0.2064          | -0.4461          |
| 0.607         | 0.84  | 800  | -2.5051       | -2.4199         | -379.1497    | -361.6935      | 0.6261          | 0.6845             | -0.2979        | 0.2401          | -0.5380          |
| 0.6322        | 0.94  | 900  | -2.4508       | -2.3644         | -397.4641    | -382.2142      | 0.6199          | 0.6905             | -0.4810        | 0.2621          | -0.7432          |
| 0.605         | 1.05  | 1000 | -2.3964       | -2.3068         | -404.5890    | -394.0288      | 0.6115          | 0.6885             | -0.5523        | 0.3090          | -0.8613          |
| 0.601         | 1.15  | 1100 | -2.3602       | -2.2683         | -418.7677    | -411.0065      | 0.6068          | 0.6964             | -0.6941        | 0.3370          | -1.0311          |
| 0.5676        | 1.26  | 1200 | -2.3216       | -2.2290         | -417.0859    | -411.9764      | 0.6020          | 0.7123             | -0.6773        | 0.3635          | -1.0408          |
| 0.5909        | 1.36  | 1300 | -2.2912       | -2.1982         | -412.9470    | -408.3128      | 0.5999          | 0.7123             | -0.6359        | 0.3683          | -1.0042          |
| 0.5711        | 1.47  | 1400 | -2.2460       | -2.1507         | -420.5697    | -419.0722      | 0.5967          | 0.7183             | -0.7121        | 0.3997          | -1.1118          |
| 0.5655        | 1.57  | 1500 | -2.2212       | -2.1253         | -412.4961    | -410.0143      | 0.5957          | 0.7222             | -0.6314        | 0.3898          | -1.0212          |
| 0.5655        | 1.67  | 1600 | -2.1858       | -2.0877         | -414.4090    | -414.7852      | 0.5925          | 0.7242             | -0.6505        | 0.4184          | -1.0689          |
| 0.5364        | 1.78  | 1700 | -2.1499       | -2.0500         | -425.4825    | -428.4342      | 0.5873          | 0.7262             | -0.7612        | 0.4442          | -1.2054          |
| 0.5702        | 1.88  | 1800 | -2.1546       | -2.0539         | -424.3879    | -429.0814      | 0.5843          | 0.7361             | -0.7503        | 0.4616          | -1.2119          |
| 0.5505        | 1.99  | 1900 | -2.1340       | -2.0328         | -413.9261    | -417.8120      | 0.5852          | 0.7321             | -0.6457        | 0.4535          | -1.0992          |
| 0.5389        | 2.09  | 2000 | -2.0806       | -1.9769         | -422.3402    | -427.3939      | 0.5828          | 0.7262             | -0.7298        | 0.4652          | -1.1950          |
| 0.531         | 2.2   | 2100 | -2.0565       | -1.9511         | -437.7683    | -446.1322      | 0.5805          | 0.7341             | -0.8841        | 0.4983          | -1.3824          |
| 0.5162        | 2.3   | 2200 | -2.0180       | -1.9112         | -435.0022    | -443.4644      | 0.5830          | 0.7341             | -0.8564        | 0.4993          | -1.3557          |
| 0.5297        | 2.41  | 2300 | -1.9911       | -1.8838         | -448.7519    | -459.4124      | 0.5795          | 0.7183             | -0.9939        | 0.5212          | -1.5152          |
| 0.5143        | 2.51  | 2400 | -1.9853       | -1.8784         | -436.2057    | -445.7617      | 0.5806          | 0.7321             | -0.8685        | 0.5102          | -1.3787          |
| 0.5377        | 2.62  | 2500 | -1.9648       | -1.8572         | -443.1574    | -454.7680      | 0.5786          | 0.7282             | -0.9380        | 0.5307          | -1.4687          |
| 0.4868        | 2.72  | 2600 | -1.9504       | -1.8416         | -439.4379    | -450.5156      | 0.5797          | 0.7302             | -0.9008        | 0.5254          | -1.4262          |
| 0.5275        | 2.83  | 2700 | -1.9219       | -1.8117         | -447.6714    | -460.6927      | 0.5754          | 0.7282             | -0.9831        | 0.5448          | -1.5280          |
| 0.5042        | 2.93  | 2800 | -1.9484       | -1.8401         | -447.7928    | -460.8577      | 0.5743          | 0.7321             | -0.9843        | 0.5453          | -1.5296          |
| 0.4862        | 3.04  | 2900 | -1.9315       | -1.8216         | -452.8863    | -467.0351      | 0.5756          | 0.7202             | -1.0353        | 0.5561          | -1.5914          |
| 0.4817        | 3.14  | 3000 | -1.8836       | -1.7716         | -453.8664    | -469.6034      | 0.5786          | 0.7282             | -1.0451        | 0.5720          | -1.6171          |
| 0.4767        | 3.24  | 3100 | -1.8663       | -1.7538         | -457.4258    | -472.9984      | 0.5770          | 0.7262             | -1.0807        | 0.5704          | -1.6510          |
| 0.4794        | 3.35  | 3200 | -1.8515       | -1.7384         | -460.2550    | -476.8743      | 0.5789          | 0.7262             | -1.1090        | 0.5808          | -1.6898          |
| 0.4784        | 3.46  | 3300 | 0.5739        | -1.1929         | -1.7842      | 0.7163         | 0.5913          | -486.3180          | -468.6473      | -1.7313         | -1.8442          |
| 0.4797        | 3.56  | 3400 | 0.5754        | -1.1487         | -1.7306      | 0.7202         | 0.5819          | -480.9566          | -464.2336      | -1.7340         | -1.8464          |
| 0.4967        | 3.66  | 3500 | 0.5763        | -1.1304         | -1.7077      | 0.7282         | 0.5773          | -478.6690          | -462.4030      | -1.7331         | -1.8458          |
| 0.4747        | 3.77  | 3600 | 0.5767        | -1.1301         | -1.7168      | 0.7262         | 0.5867          | -479.5741          | -462.3710      | -1.7268         | -1.8402          |
| 0.4895        | 3.87  | 3700 | 0.5747        | -1.1393         | -1.7177      | 0.7202         | 0.5784          | -479.6691          | -463.2915      | -1.7302         | -1.8430          |
| 0.5118        | 3.98  | 3800 | 0.5743        | -1.1478         | -1.7342      | 0.7262         | 0.5864          | -481.3118          | -464.1390      | -1.7282         | -1.8417          |
| 0.5007        | 4.08  | 3900 | 0.5753        | -1.1349         | -1.7215      | 0.7282         | 0.5866          | -480.0436          | -462.8507      | -1.7269         | -1.8403          |
| 0.461         | 4.19  | 4000 | 0.5745        | -1.1675         | -1.7563      | 0.7222         | 0.5888          | -483.5273          | -466.1142      | -1.7189         | -1.8327          |
| 0.4881        | 4.29  | 4100 | 0.5762        | -1.1482         | -1.7395      | 0.7282         | 0.5913          | -481.8481          | -464.1829      | -1.7124         | -1.8260          |
| 0.4449        | 4.4   | 4200 | 0.5765        | -1.1678         | -1.7615      | 0.7202         | 0.5937          | -484.0506          | -466.1421      | -1.7116         | -1.8251          |
| 0.4692        | 4.5   | 4300 | 0.5759        | -1.1710         | -1.7620      | 0.7242         | 0.5910          | -484.0968          | -466.4624      | -1.7143         | -1.8279          |
| 0.4654        | 4.61  | 4400 | 0.5760        | -1.1694         | -1.7633      | 0.7262         | 0.5939          | -484.2224          | -466.3009      | -1.7154         | -1.8290          |
| 0.4608        | 4.71  | 4500 | 0.5754        | -1.1765         | -1.7692      | 0.7202         | 0.5926          | -484.8123          | -467.0131      | -1.7171         | -1.8304          |
| 0.4661        | 4.82  | 4600 | 0.5754        | -1.1819         | -1.7750      | 0.7282         | 0.5931          | -485.3937          | -467.5481      | -1.7120         | -1.8255          |
| 0.4859        | 4.92  | 4700 | 0.5756        | -1.1834         | -1.7761      | 0.7202         | 0.5927          | -485.5031          | -467.6952      | -1.7101         | -1.8237          |


### Framework versions

- Transformers 4.36.2
- Pytorch 2.1.2+cu118
- Datasets 2.14.6
- Tokenizers 0.15.0