CharlesLi commited on
Commit
aafe941
1 Parent(s): 2de30ee

Model save

Browse files
README.md CHANGED
@@ -3,6 +3,7 @@ library_name: transformers
3
  tags:
4
  - trl
5
  - dpo
 
6
  - generated_from_trainer
7
  model-index:
8
  - name: OpenELM-1_1B-DPO-full-max-reward-least-similar
@@ -16,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  This model was trained from scratch on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
- - Loss: 185.1900
20
- - Rewards/chosen: -628.0
21
- - Rewards/rejected: -544.0
22
- - Rewards/accuracies: 0.4316
23
- - Rewards/margins: -85.5
24
- - Logps/rejected: -54528.0
25
- - Logps/chosen: -63232.0
26
- - Logits/rejected: 6.0625
27
- - Logits/chosen: 5.6562
28
 
29
  ## Model description
30
 
@@ -61,39 +62,39 @@ The following hyperparameters were used during training:
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
63
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
64
- | 0.6914 | 0.1047 | 100 | 0.6926 | -0.3828 | -0.3887 | 0.4492 | 0.0057 | -328.0 | -356.0 | -9.8125 | -10.1875 |
65
- | 0.6914 | 0.2094 | 200 | 70.4809 | -244.0 | -211.0 | 0.4492 | -32.75 | -21376.0 | -24704.0 | -0.9258 | -0.8281 |
66
- | 0.6914 | 0.3141 | 300 | 98.0238 | -332.0 | -288.0 | 0.4355 | -45.25 | -29056.0 | -33536.0 | -0.7578 | -0.7852 |
67
- | 0.6914 | 0.4188 | 400 | 104.3149 | -354.0 | -306.0 | 0.4355 | -48.25 | -30848.0 | -35840.0 | -0.2432 | -0.3418 |
68
- | 0.6914 | 0.5236 | 500 | 111.0501 | -376.0 | -326.0 | 0.4355 | -51.25 | -32768.0 | -37888.0 | 1.2422 | 1.0469 |
69
- | 0.6914 | 0.6283 | 600 | 117.8885 | -400.0 | -346.0 | 0.4297 | -54.5 | -34816.0 | -40192.0 | 1.9375 | 1.6875 |
70
- | 0.6914 | 0.7330 | 700 | 124.8744 | -424.0 | -366.0 | 0.4336 | -57.75 | -36864.0 | -42752.0 | 3.2031 | 2.8594 |
71
- | 0.6914 | 0.8377 | 800 | 132.0824 | -448.0 | -388.0 | 0.4297 | -61.0 | -38912.0 | -45056.0 | 4.2188 | 3.8281 |
72
- | 0.6914 | 0.9424 | 900 | 139.2829 | -472.0 | -408.0 | 0.4297 | -64.5 | -40960.0 | -47616.0 | 4.6875 | 4.3125 |
73
- | 0.6914 | 1.0471 | 1000 | 134.4353 | -464.0 | -400.0 | 0.4414 | -63.5 | -40448.0 | -46848.0 | 2.2188 | 1.8672 |
74
- | 0.6914 | 1.1518 | 1100 | 151.2930 | -512.0 | -444.0 | 0.4336 | -70.0 | -44544.0 | -51712.0 | 5.0625 | 4.6875 |
75
- | 0.6914 | 1.2565 | 1200 | 154.9059 | -528.0 | -454.0 | 0.4316 | -71.5 | -45824.0 | -52992.0 | 5.0938 | 4.7188 |
76
- | 0.6914 | 1.3613 | 1300 | 159.0080 | -540.0 | -466.0 | 0.4336 | -73.5 | -46848.0 | -54272.0 | 5.9062 | 5.5 |
77
- | 0.6914 | 1.4660 | 1400 | 162.5714 | -552.0 | -476.0 | 0.4297 | -75.0 | -47872.0 | -55552.0 | 5.75 | 5.3438 |
78
- | 0.6914 | 1.5707 | 1500 | 166.0585 | -564.0 | -486.0 | 0.4355 | -76.5 | -48896.0 | -56576.0 | 5.9062 | 5.4688 |
79
- | 0.6914 | 1.6754 | 1600 | 169.1784 | -576.0 | -496.0 | 0.4277 | -78.0 | -49920.0 | -57856.0 | 5.5 | 5.125 |
80
- | 0.6914 | 1.7801 | 1700 | 172.1316 | -584.0 | -504.0 | 0.4355 | -79.5 | -50688.0 | -58880.0 | 5.4688 | 5.0938 |
81
- | 0.6914 | 1.8848 | 1800 | 174.9129 | -592.0 | -512.0 | 0.4316 | -80.5 | -51456.0 | -59648.0 | 5.8125 | 5.4375 |
82
- | 0.6914 | 1.9895 | 1900 | 177.3406 | -600.0 | -520.0 | 0.4277 | -81.5 | -52224.0 | -60416.0 | 5.8438 | 5.4375 |
83
- | 0.6914 | 2.0942 | 2000 | 179.1992 | -608.0 | -524.0 | 0.4258 | -82.5 | -52992.0 | -61184.0 | 5.875 | 5.4688 |
84
- | 0.6914 | 2.1990 | 2100 | 180.8939 | -616.0 | -532.0 | 0.4297 | -83.5 | -53504.0 | -61696.0 | 5.875 | 5.5 |
85
- | 0.6914 | 2.3037 | 2200 | 182.2845 | -620.0 | -536.0 | 0.4316 | -84.0 | -53760.0 | -62208.0 | 6.0938 | 5.6875 |
86
- | 0.6914 | 2.4084 | 2300 | 183.2915 | -624.0 | -536.0 | 0.4375 | -84.5 | -54016.0 | -62464.0 | 5.9688 | 5.5625 |
87
- | 0.6914 | 2.5131 | 2400 | 184.1078 | -624.0 | -540.0 | 0.4336 | -85.0 | -54272.0 | -62976.0 | 6.0 | 5.5938 |
88
- | 0.6914 | 2.6178 | 2500 | 184.6541 | -628.0 | -540.0 | 0.4238 | -85.0 | -54528.0 | -62976.0 | 6.0625 | 5.6562 |
89
- | 0.6914 | 2.7225 | 2600 | 185.0226 | -628.0 | -544.0 | 0.4277 | -85.5 | -54528.0 | -63232.0 | 6.0312 | 5.625 |
90
- | 0.6914 | 2.8272 | 2700 | 185.1557 | -628.0 | -544.0 | 0.4297 | -85.5 | -54528.0 | -63232.0 | 6.0625 | 5.6562 |
91
- | 0.6914 | 2.9319 | 2800 | 185.1900 | -628.0 | -544.0 | 0.4316 | -85.5 | -54528.0 | -63232.0 | 6.0625 | 5.6562 |
92
 
93
 
94
  ### Framework versions
95
 
96
- - Transformers 4.44.2
97
  - Pytorch 2.3.0
98
- - Datasets 2.21.0
99
- - Tokenizers 0.19.1
 
3
  tags:
4
  - trl
5
  - dpo
6
+ - alignment-handbook
7
  - generated_from_trainer
8
  model-index:
9
  - name: OpenELM-1_1B-DPO-full-max-reward-least-similar
 
17
 
18
  This model was trained from scratch on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
+ - Loss: 1.2775
21
+ - Rewards/chosen: -5.2812
22
+ - Rewards/rejected: -5.5938
23
+ - Rewards/accuracies: 0.5039
24
+ - Rewards/margins: 0.3301
25
+ - Logps/rejected: -848.0
26
+ - Logps/chosen: -844.0
27
+ - Logits/rejected: -13.25
28
+ - Logits/chosen: -14.0
29
 
30
  ## Model description
31
 
 
62
 
63
  | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
  |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.0747 | 0.1047 | 100 | 0.7319 | -1.2344 | -1.3906 | 0.5195 | 0.1494 | -428.0 | -442.0 | -9.875 | -10.1875 |
66
+ | 0.0623 | 0.2094 | 200 | 0.7326 | -1.1641 | -1.3125 | 0.5137 | 0.1494 | -420.0 | -434.0 | -13.3125 | -13.625 |
67
+ | 0.0804 | 0.3141 | 300 | 1.1385 | -4.375 | -4.5938 | 0.4863 | 0.2275 | -748.0 | -756.0 | -10.9375 | -11.3125 |
68
+ | 0.1502 | 0.4188 | 400 | 0.9801 | -3.2031 | -3.3125 | 0.4844 | 0.0991 | -620.0 | -640.0 | -11.375 | -11.9375 |
69
+ | 0.0464 | 0.5236 | 500 | 0.9622 | -2.6875 | -2.7969 | 0.4805 | 0.1074 | -568.0 | -588.0 | -13.125 | -13.5625 |
70
+ | 0.0636 | 0.6283 | 600 | 1.0378 | -2.4062 | -2.4375 | 0.4727 | 0.0264 | -532.0 | -560.0 | -13.6875 | -14.0 |
71
+ | 0.0638 | 0.7330 | 700 | 0.8978 | -2.1562 | -2.1562 | 0.5039 | 0.0037 | -504.0 | -532.0 | -13.6875 | -13.875 |
72
+ | 0.0552 | 0.8377 | 800 | 0.9712 | -3.4375 | -3.4688 | 0.4980 | 0.0332 | -636.0 | -664.0 | -13.0625 | -13.5625 |
73
+ | 0.0459 | 0.9424 | 900 | 1.0447 | -4.7188 | -4.9688 | 0.5117 | 0.2490 | -784.0 | -788.0 | -12.625 | -13.0 |
74
+ | 0.0041 | 1.0471 | 1000 | 1.3027 | -4.9688 | -5.1875 | 0.4785 | 0.2383 | -808.0 | -816.0 | -14.8125 | -14.9375 |
75
+ | 0.0032 | 1.1518 | 1100 | 1.1521 | -4.5 | -4.625 | 0.5098 | 0.1455 | -752.0 | -768.0 | -15.0 | -15.3125 |
76
+ | 0.0068 | 1.2565 | 1200 | 0.9612 | -4.5 | -4.75 | 0.5312 | 0.2617 | -764.0 | -768.0 | -8.5 | -9.625 |
77
+ | 0.0038 | 1.3613 | 1300 | 1.0891 | -3.6094 | -3.8438 | 0.5332 | 0.2471 | -672.0 | -680.0 | -16.75 | -16.75 |
78
+ | 0.0036 | 1.4660 | 1400 | 1.0725 | -3.6875 | -3.875 | 0.5254 | 0.1885 | -676.0 | -688.0 | -15.6875 | -15.75 |
79
+ | 0.0067 | 1.5707 | 1500 | 1.0607 | -3.9531 | -4.1562 | 0.5117 | 0.2158 | -704.0 | -712.0 | -14.625 | -14.875 |
80
+ | 0.0068 | 1.6754 | 1600 | 1.1896 | -4.5938 | -4.9062 | 0.5137 | 0.3164 | -780.0 | -776.0 | -15.1875 | -15.5 |
81
+ | 0.0042 | 1.7801 | 1700 | 1.1288 | -4.4062 | -4.6562 | 0.5273 | 0.2676 | -756.0 | -760.0 | -15.375 | -15.875 |
82
+ | 0.0003 | 1.8848 | 1800 | 1.3009 | -5.3125 | -5.625 | 0.5059 | 0.3203 | -852.0 | -848.0 | -14.6875 | -15.1875 |
83
+ | 0.002 | 1.9895 | 1900 | 1.2142 | -4.8438 | -5.125 | 0.5156 | 0.2871 | -800.0 | -804.0 | -13.625 | -14.3125 |
84
+ | 0.0004 | 2.0942 | 2000 | 1.2300 | -4.8438 | -5.1562 | 0.5137 | 0.2969 | -804.0 | -804.0 | -13.4375 | -14.125 |
85
+ | 0.0148 | 2.1990 | 2100 | 1.2569 | -5.0625 | -5.375 | 0.5137 | 0.3223 | -828.0 | -824.0 | -13.25 | -13.9375 |
86
+ | 0.0009 | 2.3037 | 2200 | 1.2545 | -5.3125 | -5.625 | 0.5059 | 0.3184 | -852.0 | -848.0 | -12.8125 | -13.5625 |
87
+ | 0.0006 | 2.4084 | 2300 | 1.2550 | -5.25 | -5.5312 | 0.5098 | 0.3008 | -840.0 | -840.0 | -12.9375 | -13.6875 |
88
+ | 0.0002 | 2.5131 | 2400 | 1.2758 | -5.2812 | -5.625 | 0.5098 | 0.3223 | -848.0 | -848.0 | -13.1875 | -13.875 |
89
+ | 0.0004 | 2.6178 | 2500 | 1.2774 | -5.2812 | -5.5938 | 0.5039 | 0.3242 | -848.0 | -844.0 | -13.1875 | -13.875 |
90
+ | 0.0002 | 2.7225 | 2600 | 1.2790 | -5.2812 | -5.5938 | 0.5039 | 0.3281 | -848.0 | -844.0 | -13.25 | -13.9375 |
91
+ | 0.0003 | 2.8272 | 2700 | 1.2763 | -5.2812 | -5.5938 | 0.5020 | 0.3320 | -848.0 | -844.0 | -13.25 | -13.9375 |
92
+ | 0.0002 | 2.9319 | 2800 | 1.2775 | -5.2812 | -5.5938 | 0.5039 | 0.3301 | -848.0 | -844.0 | -13.25 | -14.0 |
93
 
94
 
95
  ### Framework versions
96
 
97
+ - Transformers 4.45.1
98
  - Pytorch 2.3.0
99
+ - Datasets 3.0.1
100
+ - Tokenizers 0.20.0
all_results.json CHANGED
@@ -1,9 +1,22 @@
1
  {
2
  "epoch": 3.0,
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "total_flos": 0.0,
4
- "train_loss": 0.69140625,
5
- "train_runtime": 12163.3984,
6
  "train_samples": 61119,
7
- "train_samples_per_second": 15.074,
8
- "train_steps_per_second": 0.236
9
  }
 
1
  {
2
  "epoch": 3.0,
3
+ "eval_logits/chosen": 5.65625,
4
+ "eval_logits/rejected": 6.0625,
5
+ "eval_logps/chosen": -63232.0,
6
+ "eval_logps/rejected": -54528.0,
7
+ "eval_loss": 185.16600036621094,
8
+ "eval_rewards/accuracies": 0.435546875,
9
+ "eval_rewards/chosen": -628.0,
10
+ "eval_rewards/margins": -85.5,
11
+ "eval_rewards/rejected": -544.0,
12
+ "eval_runtime": 46.5467,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 42.968,
15
+ "eval_steps_per_second": 0.687,
16
  "total_flos": 0.0,
17
+ "train_loss": 0.024607474328222707,
18
+ "train_runtime": 12418.8835,
19
  "train_samples": 61119,
20
+ "train_samples_per_second": 14.764,
21
+ "train_steps_per_second": 0.231
22
  }
config.json CHANGED
@@ -6,7 +6,7 @@
6
  ],
7
  "auto_map": {
8
  "AutoConfig": "configuration_openelm.OpenELMConfig",
9
- "AutoModelForCausalLM": "modeling_openelm.OpenELMForCausalLM"
10
  },
11
  "bos_token_id": 1,
12
  "eos_token_id": 2,
@@ -119,7 +119,7 @@
119
  "rope_max_length": 4096,
120
  "share_input_output_layers": true,
121
  "torch_dtype": "bfloat16",
122
- "transformers_version": "4.44.2",
123
  "use_cache": false,
124
  "vocab_size": 32000
125
  }
 
6
  ],
7
  "auto_map": {
8
  "AutoConfig": "configuration_openelm.OpenELMConfig",
9
+ "AutoModelForCausalLM": "apple/OpenELM-1_1B--modeling_openelm.OpenELMForCausalLM"
10
  },
11
  "bos_token_id": 1,
12
  "eos_token_id": 2,
 
119
  "rope_max_length": 4096,
120
  "share_input_output_layers": true,
121
  "torch_dtype": "bfloat16",
122
+ "transformers_version": "4.45.1",
123
  "use_cache": false,
124
  "vocab_size": 32000
125
  }
eval_results.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_logits/chosen": 5.65625,
4
+ "eval_logits/rejected": 6.0625,
5
+ "eval_logps/chosen": -63232.0,
6
+ "eval_logps/rejected": -54528.0,
7
+ "eval_loss": 185.16600036621094,
8
+ "eval_rewards/accuracies": 0.435546875,
9
+ "eval_rewards/chosen": -628.0,
10
+ "eval_rewards/margins": -85.5,
11
+ "eval_rewards/rejected": -544.0,
12
+ "eval_runtime": 46.5467,
13
+ "eval_samples": 2000,
14
+ "eval_samples_per_second": 42.968,
15
+ "eval_steps_per_second": 0.687
16
+ }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
- "transformers_version": "4.44.2"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 1,
4
  "eos_token_id": 2,
5
+ "transformers_version": "4.45.1"
6
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b532dbfa86336d046c65a36c74b037d87dd11b7ae74722564ca73b7e4df25eb8
3
  size 2159808696
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07b272f4a5a157d8ba90e4b618bf45bc276d9ab44bbc7933d43cb88ea0905824
3
  size 2159808696
runs/Oct03_08-23-59_xe8545-a100-03/events.out.tfevents.1727938140.xe8545-a100-03.210553.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7c108623e525623259e8762aff38fc4b0403ce1448b77d7c5ebb55e81305a51
3
+ size 225743
runs/Sep09_22-46-55_xe8545-a100-15/events.out.tfevents.1725927776.xe8545-a100-15.125551.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:622a60e0d712481edb8a0dad100480d68fc442f4acded4f6cce31304240aae85
3
+ size 828
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 3.0,
3
  "total_flos": 0.0,
4
- "train_loss": 0.69140625,
5
- "train_runtime": 12163.3984,
6
  "train_samples": 61119,
7
- "train_samples_per_second": 15.074,
8
- "train_steps_per_second": 0.236
9
  }
 
1
  {
2
  "epoch": 3.0,
3
  "total_flos": 0.0,
4
+ "train_loss": 0.024607474328222707,
5
+ "train_runtime": 12418.8835,
6
  "train_samples": 61119,
7
+ "train_samples_per_second": 14.764,
8
+ "train_steps_per_second": 0.231
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fbc03d2c36f39b625c3f889b27fc8a4a0eeafc1b29364e1e1f2f055d859a56b4
3
- size 7608
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:894c28e73662831b88dc391c0c2a429d08b6822c3fda90086fcbd5fb65910cf5
3
+ size 7672