File size: 7,276 Bytes
0f2e23b
 
 
 
 
53b0e31
0f2e23b
 
 
 
 
 
 
 
 
 
 
 
 
3ee5d97
 
 
 
 
 
 
 
 
0f2e23b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ee5d97
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f2e23b
 
 
 
09e373e
0f2e23b
09e373e
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
library_name: transformers
tags:
- trl
- dpo
- alignment-handbook
- generated_from_trainer
model-index:
- name: OpenELM-1_1B-DPO-full-max-8-reward
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# OpenELM-1_1B-DPO-full-max-8-reward

This model was trained from scratch on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.7740
- Rewards/chosen: -15.6875
- Rewards/rejected: -17.875
- Rewards/accuracies: 0.6172
- Rewards/margins: 2.2031
- Logps/rejected: -2080.0
- Logps/chosen: -1888.0
- Logits/rejected: 0.8320
- Logits/chosen: -0.9922

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.5077        | 0.1047 | 100  | 0.6684          | -1.2422        | -1.4922          | 0.6191             | 0.2490          | -438.0         | -442.0       | -10.4375        | -10.8125      |
| 0.436         | 0.2094 | 200  | 0.7756          | -2.9219        | -3.3281          | 0.6191             | 0.4141          | -620.0         | -608.0       | -9.6875         | -10.1875      |
| 0.4375        | 0.3141 | 300  | 0.7544          | -4.0           | -4.625           | 0.6426             | 0.6328          | -752.0         | -720.0       | -8.9375         | -9.9375       |
| 0.4641        | 0.4188 | 400  | 0.7598          | -3.5938        | -4.2188          | 0.6270             | 0.6094          | -708.0         | -680.0       | -9.8125         | -10.6875      |
| 0.3819        | 0.5236 | 500  | 0.8648          | -5.0938        | -5.8438          | 0.6074             | 0.7383          | -872.0         | -828.0       | -7.8438         | -9.125        |
| 0.4052        | 0.6283 | 600  | 0.8811          | -5.1875        | -5.9375          | 0.6016             | 0.7461          | -880.0         | -836.0       | -9.3125         | -10.625       |
| 0.397         | 0.7330 | 700  | 0.7826          | -4.5938        | -5.3438          | 0.6445             | 0.7578          | -824.0         | -780.0       | -7.5            | -9.125        |
| 0.3853        | 0.8377 | 800  | 0.8263          | -5.8438        | -6.5938          | 0.6328             | 0.7461          | -948.0         | -904.0       | -5.9688         | -7.3125       |
| 0.3438        | 0.9424 | 900  | 1.0278          | -7.5938        | -8.8125          | 0.6230             | 1.2344          | -1168.0        | -1080.0      | -2.5            | -4.2188       |
| 0.0879        | 1.0471 | 1000 | 1.2819          | -9.375         | -10.8125         | 0.6055             | 1.4375          | -1368.0        | -1256.0      | -6.625          | -8.5          |
| 0.0875        | 1.1518 | 1100 | 1.2599          | -10.3125       | -11.75           | 0.6152             | 1.4609          | -1464.0        | -1352.0      | -3.6406         | -5.25         |
| 0.1119        | 1.2565 | 1200 | 1.0713          | -7.9688        | -9.125           | 0.6230             | 1.1562          | -1200.0        | -1112.0      | -4.375          | -6.2188       |
| 0.1083        | 1.3613 | 1300 | 1.1731          | -10.1875       | -11.5            | 0.5918             | 1.2969          | -1440.0        | -1336.0      | -3.7188         | -5.4375       |
| 0.0827        | 1.4660 | 1400 | 1.0477          | -9.25          | -10.5            | 0.6152             | 1.25            | -1336.0        | -1240.0      | -2.6094         | -4.5          |
| 0.0913        | 1.5707 | 1500 | 1.0557          | -9.25          | -10.625          | 0.6270             | 1.3828          | -1352.0        | -1248.0      | -2.9688         | -4.7812       |
| 0.0813        | 1.6754 | 1600 | 1.2081          | -11.4375       | -13.0            | 0.6230             | 1.5625          | -1584.0        | -1456.0      | -1.0156         | -2.7812       |
| 0.0882        | 1.7801 | 1700 | 1.1652          | -11.5625       | -13.0            | 0.6348             | 1.4531          | -1592.0        | -1472.0      | -3.0469         | -4.7812       |
| 0.0991        | 1.8848 | 1800 | 1.0546          | -9.6875        | -11.0            | 0.6211             | 1.3203          | -1392.0        | -1288.0      | -0.2773         | -2.0469       |
| 0.0663        | 1.9895 | 1900 | 1.1602          | -11.0625       | -12.625          | 0.6348             | 1.5312          | -1552.0        | -1424.0      | -1.9766         | -3.7344       |
| 0.0132        | 2.0942 | 2000 | 1.6895          | -15.4375       | -17.5            | 0.6191             | 2.0625          | -2040.0        | -1856.0      | 0.3359          | -1.5391       |
| 0.0613        | 2.1990 | 2100 | 1.7890          | -15.8125       | -18.0            | 0.6191             | 2.2031          | -2096.0        | -1896.0      | 0.7539          | -1.0625       |
| 0.0101        | 2.3037 | 2200 | 1.7495          | -16.125        | -18.375          | 0.6211             | 2.2031          | -2128.0        | -1928.0      | 1.25            | -0.4414       |
| 0.0138        | 2.4084 | 2300 | 1.7596          | -15.625        | -17.75           | 0.6133             | 2.2031          | -2064.0        | -1880.0      | 1.0234          | -0.7891       |
| 0.0121        | 2.5131 | 2400 | 1.7912          | -15.625        | -17.875          | 0.6152             | 2.2188          | -2080.0        | -1880.0      | 0.6641          | -1.1797       |
| 0.0107        | 2.6178 | 2500 | 1.7927          | -15.75         | -18.0            | 0.6133             | 2.1875          | -2080.0        | -1896.0      | 0.8281          | -0.9883       |
| 0.0145        | 2.7225 | 2600 | 1.7578          | -15.5          | -17.625          | 0.6191             | 2.2031          | -2048.0        | -1864.0      | 0.7031          | -1.1328       |
| 0.0133        | 2.8272 | 2700 | 1.7674          | -15.625        | -17.875          | 0.6152             | 2.2031          | -2080.0        | -1880.0      | 0.8281          | -0.9961       |
| 0.0114        | 2.9319 | 2800 | 1.7740          | -15.6875       | -17.875          | 0.6172             | 2.2031          | -2080.0        | -1888.0      | 0.8320          | -0.9922       |


### Framework versions

- Transformers 4.45.1
- Pytorch 2.3.0
- Datasets 3.0.1
- Tokenizers 0.20.0