Andrey Kutuzov commited on
Commit
79fb6ab
1 Parent(s): 7dbc8ab

Camera ready

Browse files
README.md CHANGED
@@ -12,35 +12,46 @@ language:
12
  widget:
13
  - text: "Ha egen brygge og båthus. Hva betyr båthus?"
14
  example_title: "Definition generation"
 
15
  ---
16
 
 
17
 
18
- # mt0-definition-no-xl
 
19
 
20
- This model is a version of [mt0-xl](https://huggingface.co/bigscience/mt0-xl) fine-tuned on Bokmålsordboka.
21
-
22
- It achieves the following results on the evaluation set:
23
- - Loss: 1.9882
24
- - Rouge1: 31.4539
25
- - Rouge2: 16.1017
26
- - Rougel: 30.6959
27
- - Rougelsum: 30.6888
28
- - Gen Len: 8.9348
29
 
30
  ## Model description
31
 
32
- More information needed
 
33
 
34
  ## Intended uses & limitations
35
 
36
- More information needed
 
37
 
38
  ## Training and evaluation data
39
 
40
- More information needed
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Training procedure
43
 
 
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
@@ -57,25 +68,11 @@ The following hyperparameters were used during training:
57
  - lr_scheduler_type: linear
58
  - num_epochs: 20.0
59
 
60
- ### Training results
61
-
62
- | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
63
- |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
64
- | 2.6123 | 1.0 | 497 | 1.8900 | 28.8529 | 13.6138 | 28.1471 | 28.1199 | 8.9495 |
65
- | 2.0642 | 2.0 | 994 | 1.8383 | 30.5095 | 15.2505 | 29.8018 | 29.7658 | 8.6406 |
66
- | 1.7611 | 3.0 | 1491 | 1.8447 | 30.9812 | 15.6932 | 30.2339 | 30.2413 | 8.7151 |
67
- | 1.5284 | 4.0 | 1989 | 1.8619 | 30.9706 | 15.4516 | 30.1888 | 30.1911 | 9.3787 |
68
- | 1.3422 | 5.0 | 2486 | 1.8895 | 30.9451 | 15.5242 | 30.1826 | 30.1837 | 9.2026 |
69
- | 1.1862 | 6.0 | 2983 | 1.9224 | 31.3072 | 15.959 | 30.5538 | 30.5404 | 8.8816 |
70
- | 1.0526 | 7.0 | 3480 | 1.9882 | 31.4465 | 16.095 | 30.6871 | 30.6739 | 8.7294 |
71
- | 0.9384 | 8.0 | 3978 | 2.0583 | 31.1434 | 15.7298 | 30.287 | 30.2831 | 9.6134 |
72
- | 0.8408 | 9.0 | 4475 | 2.1237 | 30.7808 | 15.4943 | 29.9606 | 29.9589 | 9.6527 |
73
- | 0.7592 | 10.0 | 4972 | 2.1987 | 31.0097 | 15.5823 | 30.1202 | 30.1151 | 9.9255 |
74
-
75
-
76
  ### Framework versions
77
 
78
- - Transformers 4.30.2
79
  - Pytorch 1.13.1+rocm5.2
80
- - Datasets 2.12.0
81
- - Tokenizers 0.12.1
 
 
 
12
  widget:
13
  - text: "Ha egen brygge og båthus. Hva betyr båthus?"
14
  example_title: "Definition generation"
15
+ license: cc-by-sa-4.0
16
  ---
17
 
18
+ # mT0-Definition-No XL
19
 
20
+ This model is a version of [mT0 XL](https://huggingface.co/bigscience/mt0-xl) finetuned on [Bokmålsordboka](https://ordbokene.no/),
21
+ a dataset of Norwegian definitions and usage examples.
22
 
23
+ It generates definitions of Norwegian words in context.
24
+ Its input is the usage example and the instruction question "Hva betyr TARGET_WORD?"
 
 
 
 
 
 
 
25
 
26
  ## Model description
27
 
28
+ See details in the paper `Enriching Word Usage Graphs with Cluster Definitions` (LREC-COLING'2024) by
29
+ Mariia Fedorova, Andrey Kutuzov, Nikolay Arefyev and Dominik Schlechtweg.
30
 
31
  ## Intended uses & limitations
32
 
33
+ The model is intended for research purposes, as a source of contextualized dictionary-like lexical definitions.
34
+ Generated definitions can contain all sorts of biases and stereotypes, stemming from the underlying language model.
35
 
36
  ## Training and evaluation data
37
 
38
+ [Bokmålsordboka](https://ordbokene.no/) by The Norwegian Language Council and the University of Bergen.
39
+
40
+ ## Training results
41
+
42
+ mT0-Definition-No XL achieves the following results on the evaluation set:
43
+
44
+ - Loss: 2.0358
45
+ - Rouge1: 28.3491
46
+ - Rouge2: 14.2699
47
+ - Rougel: 27.7602
48
+ - Rougelsum: 27.752
49
+ - Gen Len: 10.0765
50
 
51
  ## Training procedure
52
 
53
+ mT0-Definition-No XL was fine-tuned in a sequence-to-sequence mode on examples of contextualized dictionary definitions.
54
+
55
  ### Training hyperparameters
56
 
57
  The following hyperparameters were used during training:
 
68
  - lr_scheduler_type: linear
69
  - num_epochs: 20.0
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  ### Framework versions
72
 
73
+ - Transformers 4.37.1
74
  - Pytorch 1.13.1+rocm5.2
75
+ - Datasets 2.16.1
76
+ - Tokenizers 0.15.1
77
+
78
+ ## Citation
all_results.json CHANGED
@@ -1,18 +1,18 @@
1
  {
2
- "epoch": 10.0,
3
- "eval_gen_len": 8.93481334841629,
4
- "eval_loss": 1.9881515502929688,
5
- "eval_rouge1": 31.4539,
6
- "eval_rouge2": 16.1017,
7
- "eval_rougeL": 30.6959,
8
- "eval_rougeLsum": 30.6888,
9
- "eval_runtime": 176.3688,
10
  "eval_samples": 7072,
11
- "eval_samples_per_second": 40.098,
12
- "eval_steps_per_second": 1.253,
13
- "train_loss": 1.4084750825324945,
14
- "train_runtime": 9144.3174,
15
  "train_samples": 63639,
16
- "train_samples_per_second": 139.188,
17
- "train_steps_per_second": 1.087
18
  }
 
1
  {
2
+ "epoch": 11.0,
3
+ "eval_gen_len": 10.07649886877828,
4
+ "eval_loss": 2.0358376502990723,
5
+ "eval_rouge1": 28.3491,
6
+ "eval_rouge2": 14.2699,
7
+ "eval_rougeL": 27.7602,
8
+ "eval_rougeLsum": 27.752,
9
+ "eval_runtime": 189.2199,
10
  "eval_samples": 7072,
11
+ "eval_samples_per_second": 37.375,
12
+ "eval_steps_per_second": 1.168,
13
+ "train_loss": 1.3756333750635685,
14
+ "train_runtime": 9869.4584,
15
  "train_samples": 63639,
16
+ "train_samples_per_second": 128.961,
17
+ "train_steps_per_second": 1.007
18
  }
config.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
- "_name_or_path": "mt0-xl",
3
  "architectures": [
4
  "MT5ForConditionalGeneration"
5
  ],
 
6
  "d_ff": 5120,
7
  "d_kv": 64,
8
  "d_model": 2048,
@@ -26,7 +27,7 @@
26
  "tie_word_embeddings": false,
27
  "tokenizer_class": "T5Tokenizer",
28
  "torch_dtype": "float32",
29
- "transformers_version": "4.30.2",
30
  "use_cache": true,
31
  "vocab_size": 250112
32
  }
 
1
  {
2
+ "_name_or_path": "mt0-xl/",
3
  "architectures": [
4
  "MT5ForConditionalGeneration"
5
  ],
6
+ "classifier_dropout": 0.0,
7
  "d_ff": 5120,
8
  "d_kv": 64,
9
  "d_model": 2048,
 
27
  "tie_word_embeddings": false,
28
  "tokenizer_class": "T5Tokenizer",
29
  "torch_dtype": "float32",
30
+ "transformers_version": "4.37.1",
31
  "use_cache": true,
32
  "vocab_size": 250112
33
  }
eval_results.json CHANGED
@@ -1,13 +1,13 @@
1
  {
2
- "epoch": 10.0,
3
- "eval_gen_len": 8.93481334841629,
4
- "eval_loss": 1.9881515502929688,
5
- "eval_rouge1": 31.4539,
6
- "eval_rouge2": 16.1017,
7
- "eval_rougeL": 30.6959,
8
- "eval_rougeLsum": 30.6888,
9
- "eval_runtime": 176.3688,
10
  "eval_samples": 7072,
11
- "eval_samples_per_second": 40.098,
12
- "eval_steps_per_second": 1.253
13
  }
 
1
  {
2
+ "epoch": 11.0,
3
+ "eval_gen_len": 10.07649886877828,
4
+ "eval_loss": 2.0358376502990723,
5
+ "eval_rouge1": 28.3491,
6
+ "eval_rouge2": 14.2699,
7
+ "eval_rougeL": 27.7602,
8
+ "eval_rougeLsum": 27.752,
9
+ "eval_runtime": 189.2199,
10
  "eval_samples": 7072,
11
+ "eval_samples_per_second": 37.375,
12
+ "eval_steps_per_second": 1.168
13
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "decoder_start_token_id": 0,
3
  "eos_token_id": 1,
4
  "pad_token_id": 0,
5
- "transformers_version": "4.30.2"
6
  }
 
2
  "decoder_start_token_id": 0,
3
  "eos_token_id": 1,
4
  "pad_token_id": 0,
5
+ "transformers_version": "4.37.1"
6
  }
pytorch_model-00001-of-00002.bin → pytorch_model-00001-of-00003.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:309ec67c5e94c3ffa70f23e49688256cd4022059e270daed36fa78c0b15ac31f
3
- size 9977020596
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a1fe9a02ef2f2f7f2ad22d01aee5265f757e61e622917e74599ffdb0f3a1c67
3
+ size 4993619647
pytorch_model-00002-of-00003.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4bf8a6b2f0adc87722dee18bc9e03d784909841108eb101cf5c15172aefd979e
3
+ size 4983398004
pytorch_model-00002-of-00002.bin → pytorch_model-00003-of-00003.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9bc631cc9a7f2880d2096313e2c5bcc77f3a6c853c38998df3ad0ce4e80b2053
3
  size 4993663292
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c4e159dc91bb5a2b3b675915347c543eb153168a7f2b27fc176d1a86f5aad19
3
  size 4993663292
pytorch_model.bin.index.json CHANGED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json CHANGED
@@ -1,5 +1,23 @@
1
  {
2
- "eos_token": "</s>",
3
- "pad_token": "<pad>",
4
- "unk_token": "<unk>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  }
 
1
  {
2
+ "eos_token": {
3
+ "content": "</s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "pad_token": {
10
+ "content": "<pad>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
  }
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6502d07619068a98aa2d3bb531332a694ffe108ca6c6fe62a467ccfe98d666b9
3
- size 16315219
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c00dd03b7b29fa0ca79bd6b2ac2a9575b3175486939f4c3429a27812e2830bbb
3
+ size 16315311
tokenizer_config.json CHANGED
@@ -1,5 +1,31 @@
1
  {
2
- "additional_special_tokens": null,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  "clean_up_tokenization_spaces": true,
4
  "eos_token": "</s>",
5
  "extra_ids": 0,
 
1
  {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<pad>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "</s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<unk>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ }
27
+ },
28
+ "additional_special_tokens": [],
29
  "clean_up_tokenization_spaces": true,
30
  "eos_token": "</s>",
31
  "extra_ids": 0,
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 10.0,
3
- "train_loss": 1.4084750825324945,
4
- "train_runtime": 9144.3174,
5
  "train_samples": 63639,
6
- "train_samples_per_second": 139.188,
7
- "train_steps_per_second": 1.087
8
  }
 
1
  {
2
+ "epoch": 11.0,
3
+ "train_loss": 1.3756333750635685,
4
+ "train_runtime": 9869.4584,
5
  "train_samples": 63639,
6
+ "train_samples_per_second": 128.961,
7
+ "train_steps_per_second": 1.007
8
  }
trainer_state.json CHANGED
@@ -1,8 +1,9 @@
1
  {
2
- "best_metric": 31.4465,
3
- "best_model_checkpoint": "models/mt0-xl_norwegian_natprompt_adafactor/checkpoint-3480",
4
- "epoch": 9.998994469582705,
5
- "global_step": 4972,
 
6
  "is_hyper_param_search": false,
7
  "is_local_process_zero": true,
8
  "is_world_process_zero": true,
@@ -10,206 +11,229 @@
10
  {
11
  "epoch": 1.0,
12
  "learning_rate": 4.75e-05,
13
- "loss": 2.6123,
14
  "step": 497
15
  },
16
  {
17
  "epoch": 1.0,
18
- "eval_gen_len": 8.94951923076923,
19
- "eval_loss": 1.8899891376495361,
20
- "eval_rouge1": 28.8529,
21
- "eval_rouge2": 13.6138,
22
- "eval_rougeL": 28.1471,
23
- "eval_rougeLsum": 28.1199,
24
- "eval_runtime": 143.958,
25
- "eval_samples_per_second": 49.125,
26
- "eval_steps_per_second": 1.535,
27
  "step": 497
28
  },
29
  {
30
  "epoch": 2.0,
31
  "learning_rate": 4.5e-05,
32
- "loss": 2.0642,
33
  "step": 994
34
  },
35
  {
36
  "epoch": 2.0,
37
- "eval_gen_len": 8.640554298642535,
38
- "eval_loss": 1.8383045196533203,
39
- "eval_rouge1": 30.5095,
40
- "eval_rouge2": 15.2505,
41
- "eval_rougeL": 29.8018,
42
- "eval_rougeLsum": 29.7658,
43
- "eval_runtime": 141.4332,
44
- "eval_samples_per_second": 50.002,
45
- "eval_steps_per_second": 1.563,
46
  "step": 994
47
  },
48
  {
49
  "epoch": 3.0,
50
  "learning_rate": 4.25e-05,
51
- "loss": 1.7611,
52
  "step": 1491
53
  },
54
  {
55
  "epoch": 3.0,
56
- "eval_gen_len": 8.715073529411764,
57
- "eval_loss": 1.8447171449661255,
58
- "eval_rouge1": 30.9812,
59
- "eval_rouge2": 15.6932,
60
- "eval_rougeL": 30.2339,
61
- "eval_rougeLsum": 30.2413,
62
- "eval_runtime": 144.3959,
63
- "eval_samples_per_second": 48.976,
64
- "eval_steps_per_second": 1.531,
65
  "step": 1491
66
  },
67
  {
68
  "epoch": 4.0,
69
  "learning_rate": 3.999496981891348e-05,
70
- "loss": 1.5284,
71
  "step": 1989
72
  },
73
  {
74
  "epoch": 4.0,
75
- "eval_gen_len": 9.378676470588236,
76
- "eval_loss": 1.8618718385696411,
77
- "eval_rouge1": 30.9706,
78
- "eval_rouge2": 15.4516,
79
- "eval_rougeL": 30.1888,
80
- "eval_rougeLsum": 30.1911,
81
- "eval_runtime": 144.7653,
82
- "eval_samples_per_second": 48.851,
83
- "eval_steps_per_second": 1.527,
84
  "step": 1989
85
  },
86
  {
87
  "epoch": 5.0,
88
  "learning_rate": 3.749496981891348e-05,
89
- "loss": 1.3422,
90
  "step": 2486
91
  },
92
  {
93
  "epoch": 5.0,
94
- "eval_gen_len": 9.202630090497738,
95
- "eval_loss": 1.889543890953064,
96
- "eval_rouge1": 30.9451,
97
- "eval_rouge2": 15.5242,
98
- "eval_rougeL": 30.1826,
99
- "eval_rougeLsum": 30.1837,
100
- "eval_runtime": 149.0359,
101
- "eval_samples_per_second": 47.452,
102
- "eval_steps_per_second": 1.483,
103
  "step": 2486
104
  },
105
  {
106
  "epoch": 6.0,
107
  "learning_rate": 3.499496981891348e-05,
108
- "loss": 1.1862,
109
  "step": 2983
110
  },
111
  {
112
  "epoch": 6.0,
113
- "eval_gen_len": 8.88164592760181,
114
- "eval_loss": 1.922377347946167,
115
- "eval_rouge1": 31.3072,
116
- "eval_rouge2": 15.959,
117
- "eval_rougeL": 30.5538,
118
- "eval_rougeLsum": 30.5404,
119
- "eval_runtime": 145.1437,
120
- "eval_samples_per_second": 48.724,
121
- "eval_steps_per_second": 1.523,
122
  "step": 2983
123
  },
124
  {
125
  "epoch": 7.0,
126
  "learning_rate": 3.249496981891348e-05,
127
- "loss": 1.0526,
128
  "step": 3480
129
  },
130
  {
131
  "epoch": 7.0,
132
- "eval_gen_len": 8.72935520361991,
133
- "eval_loss": 1.9881515502929688,
134
- "eval_rouge1": 31.4465,
135
- "eval_rouge2": 16.095,
136
- "eval_rougeL": 30.6871,
137
- "eval_rougeLsum": 30.6739,
138
- "eval_runtime": 188.8728,
139
- "eval_samples_per_second": 37.443,
140
- "eval_steps_per_second": 1.17,
141
  "step": 3480
142
  },
143
  {
144
  "epoch": 8.0,
145
  "learning_rate": 2.9989939637826965e-05,
146
- "loss": 0.9384,
147
  "step": 3978
148
  },
149
  {
150
  "epoch": 8.0,
151
- "eval_gen_len": 9.613404977375566,
152
- "eval_loss": 2.0582804679870605,
153
- "eval_rouge1": 31.1434,
154
- "eval_rouge2": 15.7298,
155
- "eval_rougeL": 30.287,
156
- "eval_rougeLsum": 30.2831,
157
- "eval_runtime": 148.3771,
158
- "eval_samples_per_second": 47.662,
159
- "eval_steps_per_second": 1.489,
160
  "step": 3978
161
  },
162
  {
163
  "epoch": 9.0,
164
  "learning_rate": 2.7489939637826962e-05,
165
- "loss": 0.8408,
166
  "step": 4475
167
  },
168
  {
169
  "epoch": 9.0,
170
- "eval_gen_len": 9.652714932126697,
171
- "eval_loss": 2.1236588954925537,
172
- "eval_rouge1": 30.7808,
173
- "eval_rouge2": 15.4943,
174
- "eval_rougeL": 29.9606,
175
- "eval_rougeLsum": 29.9589,
176
- "eval_runtime": 147.2785,
177
- "eval_samples_per_second": 48.018,
178
- "eval_steps_per_second": 1.501,
179
  "step": 4475
180
  },
181
  {
182
  "epoch": 10.0,
183
  "learning_rate": 2.4989939637826962e-05,
184
- "loss": 0.7592,
185
  "step": 4972
186
  },
187
  {
188
  "epoch": 10.0,
189
- "eval_gen_len": 9.92548076923077,
190
- "eval_loss": 2.198713779449463,
191
- "eval_rouge1": 31.0097,
192
- "eval_rouge2": 15.5823,
193
- "eval_rougeL": 30.1202,
194
- "eval_rougeLsum": 30.1151,
195
- "eval_runtime": 147.1485,
196
- "eval_samples_per_second": 48.06,
197
- "eval_steps_per_second": 1.502,
198
  "step": 4972
199
  },
200
  {
201
- "epoch": 10.0,
202
- "step": 4972,
203
- "total_flos": 2.421678655143936e+17,
204
- "train_loss": 1.4084750825324945,
205
- "train_runtime": 9144.3174,
206
- "train_samples_per_second": 139.188,
207
- "train_steps_per_second": 1.087
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
  }
209
  ],
 
210
  "max_steps": 9940,
 
211
  "num_train_epochs": 20,
212
- "total_flos": 2.421678655143936e+17,
 
 
213
  "trial_name": null,
214
  "trial_params": null
215
  }
 
1
  {
2
+ "best_metric": 28.3473,
3
+ "best_model_checkpoint": "models/mt0-xl_norwegian_natprompt_updated/checkpoint-3978",
4
+ "epoch": 10.998491704374057,
5
+ "eval_steps": 500,
6
+ "global_step": 5469,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
11
  {
12
  "epoch": 1.0,
13
  "learning_rate": 4.75e-05,
14
+ "loss": 2.6165,
15
  "step": 497
16
  },
17
  {
18
  "epoch": 1.0,
19
+ "eval_gen_len": 8.414592760180996,
20
+ "eval_loss": 1.9032089710235596,
21
+ "eval_rouge1": 26.3511,
22
+ "eval_rouge2": 12.3506,
23
+ "eval_rougeL": 25.9106,
24
+ "eval_rougeLsum": 25.9089,
25
+ "eval_runtime": 138.9748,
26
+ "eval_samples_per_second": 50.887,
27
+ "eval_steps_per_second": 1.59,
28
  "step": 497
29
  },
30
  {
31
  "epoch": 2.0,
32
  "learning_rate": 4.5e-05,
33
+ "loss": 2.0889,
34
  "step": 994
35
  },
36
  {
37
  "epoch": 2.0,
38
+ "eval_gen_len": 9.177601809954751,
39
+ "eval_loss": 1.8498029708862305,
40
+ "eval_rouge1": 27.424,
41
+ "eval_rouge2": 13.5368,
42
+ "eval_rougeL": 26.9384,
43
+ "eval_rougeLsum": 26.9301,
44
+ "eval_runtime": 137.6855,
45
+ "eval_samples_per_second": 51.363,
46
+ "eval_steps_per_second": 1.605,
47
  "step": 994
48
  },
49
  {
50
  "epoch": 3.0,
51
  "learning_rate": 4.25e-05,
52
+ "loss": 1.7957,
53
  "step": 1491
54
  },
55
  {
56
  "epoch": 3.0,
57
+ "eval_gen_len": 8.836962669683258,
58
+ "eval_loss": 1.8434585332870483,
59
+ "eval_rouge1": 27.6735,
60
+ "eval_rouge2": 13.985,
61
+ "eval_rougeL": 27.1922,
62
+ "eval_rougeLsum": 27.2052,
63
+ "eval_runtime": 138.4103,
64
+ "eval_samples_per_second": 51.094,
65
+ "eval_steps_per_second": 1.597,
66
  "step": 1491
67
  },
68
  {
69
  "epoch": 4.0,
70
  "learning_rate": 3.999496981891348e-05,
71
+ "loss": 1.5665,
72
  "step": 1989
73
  },
74
  {
75
  "epoch": 4.0,
76
+ "eval_gen_len": 8.857466063348417,
77
+ "eval_loss": 1.850651502609253,
78
+ "eval_rouge1": 28.2493,
79
+ "eval_rouge2": 14.358,
80
+ "eval_rougeL": 27.7524,
81
+ "eval_rougeLsum": 27.7456,
82
+ "eval_runtime": 138.4439,
83
+ "eval_samples_per_second": 51.082,
84
+ "eval_steps_per_second": 1.596,
85
  "step": 1989
86
  },
87
  {
88
  "epoch": 5.0,
89
  "learning_rate": 3.749496981891348e-05,
90
+ "loss": 1.3801,
91
  "step": 2486
92
  },
93
  {
94
  "epoch": 5.0,
95
+ "eval_gen_len": 9.235718325791856,
96
+ "eval_loss": 1.889418125152588,
97
+ "eval_rouge1": 28.2511,
98
+ "eval_rouge2": 14.2431,
99
+ "eval_rougeL": 27.6841,
100
+ "eval_rougeLsum": 27.6785,
101
+ "eval_runtime": 141.1671,
102
+ "eval_samples_per_second": 50.097,
103
+ "eval_steps_per_second": 1.566,
104
  "step": 2486
105
  },
106
  {
107
  "epoch": 6.0,
108
  "learning_rate": 3.499496981891348e-05,
109
+ "loss": 1.2233,
110
  "step": 2983
111
  },
112
  {
113
  "epoch": 6.0,
114
+ "eval_gen_len": 9.103930995475114,
115
+ "eval_loss": 1.916949987411499,
116
+ "eval_rouge1": 28.3057,
117
+ "eval_rouge2": 14.349,
118
+ "eval_rougeL": 27.7482,
119
+ "eval_rougeLsum": 27.7371,
120
+ "eval_runtime": 138.9786,
121
+ "eval_samples_per_second": 50.886,
122
+ "eval_steps_per_second": 1.59,
123
  "step": 2983
124
  },
125
  {
126
  "epoch": 7.0,
127
  "learning_rate": 3.249496981891348e-05,
128
+ "loss": 1.0877,
129
  "step": 3480
130
  },
131
  {
132
  "epoch": 7.0,
133
+ "eval_gen_len": 9.20475113122172,
134
+ "eval_loss": 1.9742506742477417,
135
+ "eval_rouge1": 28.2671,
136
+ "eval_rouge2": 14.4585,
137
+ "eval_rougeL": 27.725,
138
+ "eval_rougeLsum": 27.7475,
139
+ "eval_runtime": 139.1018,
140
+ "eval_samples_per_second": 50.84,
141
+ "eval_steps_per_second": 1.589,
142
  "step": 3480
143
  },
144
  {
145
  "epoch": 8.0,
146
  "learning_rate": 2.9989939637826965e-05,
147
+ "loss": 0.9717,
148
  "step": 3978
149
  },
150
  {
151
  "epoch": 8.0,
152
+ "eval_gen_len": 9.785633484162895,
153
+ "eval_loss": 2.0358376502990723,
154
+ "eval_rouge1": 28.3473,
155
+ "eval_rouge2": 14.2734,
156
+ "eval_rougeL": 27.7737,
157
+ "eval_rougeLsum": 27.7661,
158
+ "eval_runtime": 142.8331,
159
+ "eval_samples_per_second": 49.512,
160
+ "eval_steps_per_second": 1.547,
161
  "step": 3978
162
  },
163
  {
164
  "epoch": 9.0,
165
  "learning_rate": 2.7489939637826962e-05,
166
+ "loss": 0.8777,
167
  "step": 4475
168
  },
169
  {
170
  "epoch": 9.0,
171
+ "eval_gen_len": 9.886453619909503,
172
+ "eval_loss": 2.0969080924987793,
173
+ "eval_rouge1": 27.7863,
174
+ "eval_rouge2": 13.8157,
175
+ "eval_rougeL": 27.1987,
176
+ "eval_rougeLsum": 27.1859,
177
+ "eval_runtime": 141.209,
178
+ "eval_samples_per_second": 50.082,
179
+ "eval_steps_per_second": 1.565,
180
  "step": 4475
181
  },
182
  {
183
  "epoch": 10.0,
184
  "learning_rate": 2.4989939637826962e-05,
185
+ "loss": 0.7983,
186
  "step": 4972
187
  },
188
  {
189
  "epoch": 10.0,
190
+ "eval_gen_len": 9.238829185520363,
191
+ "eval_loss": 2.1536314487457275,
192
+ "eval_rouge1": 28.2427,
193
+ "eval_rouge2": 14.3725,
194
+ "eval_rougeL": 27.6965,
195
+ "eval_rougeLsum": 27.7019,
196
+ "eval_runtime": 138.9465,
197
+ "eval_samples_per_second": 50.897,
198
+ "eval_steps_per_second": 1.591,
199
  "step": 4972
200
  },
201
  {
202
+ "epoch": 11.0,
203
+ "learning_rate": 2.2489939637826963e-05,
204
+ "loss": 0.7261,
205
+ "step": 5469
206
+ },
207
+ {
208
+ "epoch": 11.0,
209
+ "eval_gen_len": 9.441742081447964,
210
+ "eval_loss": 2.1868813037872314,
211
+ "eval_rouge1": 28.0261,
212
+ "eval_rouge2": 14.1267,
213
+ "eval_rougeL": 27.4232,
214
+ "eval_rougeLsum": 27.4261,
215
+ "eval_runtime": 140.5761,
216
+ "eval_samples_per_second": 50.307,
217
+ "eval_steps_per_second": 1.572,
218
+ "step": 5469
219
+ },
220
+ {
221
+ "epoch": 11.0,
222
+ "step": 5469,
223
+ "total_flos": 2.6645663357023027e+17,
224
+ "train_loss": 1.3756333750635685,
225
+ "train_runtime": 9869.4584,
226
+ "train_samples_per_second": 128.961,
227
+ "train_steps_per_second": 1.007
228
  }
229
  ],
230
+ "logging_steps": 500,
231
  "max_steps": 9940,
232
+ "num_input_tokens_seen": 0,
233
  "num_train_epochs": 20,
234
+ "save_steps": 500,
235
+ "total_flos": 2.6645663357023027e+17,
236
+ "train_batch_size": 4,
237
  "trial_name": null,
238
  "trial_params": null
239
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8ea83b67b65f113aa36d935cdb4d3a537ad76372cda288acf58eb6f9c599fac1
3
- size 4091
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b7a798cecc88bb22c41b9c8ab4c97ef2a2bdbd9d8ec25ade15a04a3f45d8db18
3
+ size 4411
upload.py DELETED
@@ -1,11 +0,0 @@
1
- #!/bin/env python3
2
-
3
- import sys
4
- from huggingface_hub import HfApi
5
- from huggingface_hub import create_repo
6
-
7
- create_repo(sys.argv[1])
8
- api = HfApi()
9
-
10
- api.upload_folder(folder_path=".", repo_id=sys.argv[1], repo_type="model")
11
-