Yeb Havinga commited on
Commit
ccb02b1
1 Parent(s): 657c998

Add model and card

Browse files
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - summarization
4
+ language:
5
+ - dutch
6
+ datasets:
7
+ - xsum_nl
8
+ widget:
9
+ - text: "Onderzoekers ontdekten dat vier van de vijf kinderen in Engeland die op school lunches hadden gegeten, op school voedsel hadden geprobeerd dat ze thuis niet hadden geprobeerd.De helft van de ondervraagde ouders zei dat hun kinderen hadden gevraagd om voedsel dat ze op school hadden gegeten om thuis te worden gekookt.De enquête, van ongeveer 1.000 ouders, vond dat de meest populaire groenten wortelen, suikermaïs en erwten waren.Aubergine, kikkererwten en spinazie waren een van de minst populaire.Van de ondervraagde ouders, 628 hadden kinderen die lunches op school aten. (% duidt op een deel van de ouders die zeiden dat hun kind elke groente zou eten) England's School Food Trust gaf opdracht tot het onderzoek na een onderzoek door de Mumsnet-website suggereerde dat sommige ouders hun kinderen lunchpakket gaven omdat ze dachten dat ze te kieskeurig waren om iets anders te eten. \"Schoolmaaltijden kunnen een geweldige manier zijn om ouders te helpen hun kinderen aan te moedigen om nieuw voedsel te proberen en om de verscheidenheid van voedsel in hun dieet te verhogen. \"Mumsnet medeoprichter, Carrie Longton, zei: \"Het krijgen van kinderen om gezond te eten is de droom van elke ouder, maar maaltijdtijden thuis kan vaak een slagveld en emotioneel geladen zijn. \"Vanuit Mumsnetters' ervaring lijkt het erop dat eenmaal op school is er een verlangen om in te passen bij iedereen anders en zelfs een aantal positieve peer pressure om op te scheppen over de verscheidenheid van wat voedsel je kunt eten. \"Schoolmaaltijden zijn ook verplaatst op nogal een beetje van toen Mumsnetters op school waren, met gezondere opties en meer afwisseling. \"Schoolmaaltijden in Engeland moeten nu voldoen aan strenge voedingsrichtlijnen.Ongeveer vier op de tien basisschoolkinderen in Engeland eten nu schoollunches, iets meer dan op middelbare scholen.Meer kinderen in Schotland eten schoollunches - ongeveer 46%.Het onderzoek werd online uitgevoerd tussen 26 februari en 5 maart onder een panel van ouders die ten minste één kind op school hadden van 4-17 jaar oud."
10
+ - text: "Het Londense trio staat klaar voor de beste Britse act en beste album, evenals voor twee nominaties in de beste song categorie. \"We kregen te horen zoals vanmorgen 'Oh I think you're genomineerd',\" zei Dappy. \"En ik was als 'Oh yeah, what one?' En nu zijn we genomineerd voor vier awards. Ik bedoel, wow! \"Bandmate Fazer voegde eraan toe: \"We dachten dat het het beste van ons was om met iedereen naar beneden te komen en hallo te zeggen tegen de camera's.En nu vinden we dat we vier nominaties hebben. \"De band heeft twee shots bij de beste song prijs, het krijgen van het knikje voor hun Tyncy Stryder samenwerking nummer één, en single Strong Again.Their album Uncle B zal ook gaan tegen platen van Beyonce en Kany \"Aan het eind van de dag zijn we dankbaar om te zijn waar we zijn in onze carrières. \"Als het niet gebeurt dan gebeurt het niet - live om te vechten een andere dag en blijven maken albums en hits voor de fans. \"Dappy onthulde ook dat ze kunnen worden optreden live op de avond.De groep zal doen Nummer Een en ook een mogelijke uitlevering van de War Child single, I Got Soul.Het liefdadigheidslied is een re-working van The Killers' All These Things That I've Done en is ingesteld op artiesten als Chipmunk, Ironik en Pixie Lott.Dit jaar zal Mobos worden gehouden buiten Londen voor de eerste keer, in Glasgow op 30 september.N-Dubz zei dat ze op zoek waren naar optredens voor hun Schotse fans en bogen over hun recente shows ten noorden van de Londense We hebben Aberdeen ongeveer drie of vier maanden geleden gedaan - we hebben die show daar verbrijzeld! Overal waar we heen gaan slaan we hem in elkaar!\""
11
+ ---
12
+
13
+ # mt5-base-mixednews-nl
14
+
15
+ mt5-base finetuned on three mixed news source:
16
+
17
+ 1. CNN DM translated to nl with MarianMT.
18
+ 2. XSUM translated to nl with MarianMt.
19
+ 3. News article summaries distilled from the nu.nl website.
20
+
21
+ * Learning rate 1e-3
22
+ * Trained for one epochs
23
+ * Max source length 1024
24
+ * Max target length 300
25
+ * Min target length 75
26
+
27
+ * rouge1 28.8482
28
+ * rouge2 9.4584
29
+ * rougeL 20.1697
all_results.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "test_gen_len": 86.2,
4
+ "test_loss": 1.6419,
5
+ "test_n_objs": 500,
6
+ "test_rouge1": 28.8482,
7
+ "test_rouge2": 9.4584,
8
+ "test_rougeL": 20.1697,
9
+ "test_rougeLsum": 24.0384,
10
+ "test_runtime": 529.4182,
11
+ "test_samples_per_second": 0.944,
12
+ "train_n_objs": 1360000,
13
+ "train_runtime": 228303.6105,
14
+ "train_samples_per_second": 0.745,
15
+ "val_gen_len": 86.9,
16
+ "val_loss": 1.5501,
17
+ "val_n_objs": 100,
18
+ "val_rouge1": 28.9281,
19
+ "val_rouge2": 9.8583,
20
+ "val_rougeL": 20.7577,
21
+ "val_rougeLsum": 24.4792,
22
+ "val_runtime": 108.2772,
23
+ "val_samples_per_second": 0.924
24
+ }
config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "google/mt5-base",
3
+ "architectures": [
4
+ "MT5ForConditionalGeneration"
5
+ ],
6
+ "d_ff": 2048,
7
+ "d_kv": 64,
8
+ "d_model": 768,
9
+ "decoder_start_token_id": 0,
10
+ "dropout_rate": 0.1,
11
+ "early_stopping": true,
12
+ "eos_token_id": 1,
13
+ "feed_forward_proj": "gated-gelu",
14
+ "initializer_factor": 1.0,
15
+ "is_encoder_decoder": true,
16
+ "layer_norm_epsilon": 1e-06,
17
+ "length_penalty": 2.0,
18
+ "max_length": 300,
19
+ "min_length": 75,
20
+ "model_type": "mt5",
21
+ "n_positions": 512,
22
+ "no_repeat_ngram_size": 3,
23
+ "num_beams": 4,
24
+ "num_decoder_layers": 12,
25
+ "num_heads": 12,
26
+ "num_layers": 12,
27
+ "output_past": true,
28
+ "pad_token_id": 0,
29
+ "prefix": "summarize: ",
30
+ "relative_attention_num_buckets": 32,
31
+ "task_specific_params": {
32
+ "summarization": {
33
+ "early_stopping": true,
34
+ "length_penalty": 2.0,
35
+ "max_length": 300,
36
+ "min_length": 75,
37
+ "no_repeat_ngram_size": 3,
38
+ "num_beams": 4,
39
+ "prefix": "summarize: "
40
+ }
41
+ },
42
+ "tie_word_embeddings": false,
43
+ "tokenizer_class": "T5Tokenizer",
44
+ "transformers_version": "4.4.0.dev0",
45
+ "use_cache": true,
46
+ "vocab_size": 250112
47
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e28f9ce6336f9fe92a183ecab219c84fb8990dbd599f2ef257f12137e05dbca
3
+ size 2329707353
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>"}
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef78f86560d809067d12bac6c09f19a462cb3af3f54d2b8acbba26e1433125d6
3
+ size 4309802
test_results.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "test_gen_len": 86.2,
3
+ "test_loss": 1.6419,
4
+ "test_n_objs": 500,
5
+ "test_rouge1": 28.8482,
6
+ "test_rouge2": 9.4584,
7
+ "test_rougeL": 20.1697,
8
+ "test_rougeLsum": 24.0384,
9
+ "test_runtime": 529.4182,
10
+ "test_samples_per_second": 0.944
11
+ }
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "extra_ids": 0, "additional_special_tokens": null, "special_tokens_map_file": "/home/patrick/.cache/torch/transformers/685ac0ca8568ec593a48b61b0a3c272beee9bc194a3c7241d15dcadb5f875e53.f76030f3ec1b96a8199b2593390c610e76ca8028ef3d24680000619ffb646276", "name_or_path": "google/mt5-base"}
train_results.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_n_objs": 1360000,
4
+ "train_runtime": 228303.6105,
5
+ "train_samples_per_second": 0.745
6
+ }
trainer_state.json ADDED
@@ -0,0 +1,2075 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 1.0,
5
+ "global_step": 170000,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.0,
12
+ "learning_rate": 0.0009970588235294119,
13
+ "loss": 5.4687,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.01,
18
+ "learning_rate": 0.0009941176470588235,
19
+ "loss": 2.9773,
20
+ "step": 1000
21
+ },
22
+ {
23
+ "epoch": 0.01,
24
+ "learning_rate": 0.0009911764705882353,
25
+ "loss": 2.803,
26
+ "step": 1500
27
+ },
28
+ {
29
+ "epoch": 0.01,
30
+ "learning_rate": 0.0009882352941176472,
31
+ "loss": 2.6892,
32
+ "step": 2000
33
+ },
34
+ {
35
+ "epoch": 0.01,
36
+ "learning_rate": 0.0009852941176470588,
37
+ "loss": 2.6263,
38
+ "step": 2500
39
+ },
40
+ {
41
+ "epoch": 0.02,
42
+ "learning_rate": 0.0009823529411764707,
43
+ "loss": 2.5926,
44
+ "step": 3000
45
+ },
46
+ {
47
+ "epoch": 0.02,
48
+ "learning_rate": 0.0009794117647058823,
49
+ "loss": 2.5264,
50
+ "step": 3500
51
+ },
52
+ {
53
+ "epoch": 0.02,
54
+ "learning_rate": 0.0009764705882352941,
55
+ "loss": 2.5018,
56
+ "step": 4000
57
+ },
58
+ {
59
+ "epoch": 0.03,
60
+ "learning_rate": 0.0009735294117647059,
61
+ "loss": 2.4978,
62
+ "step": 4500
63
+ },
64
+ {
65
+ "epoch": 0.03,
66
+ "learning_rate": 0.0009705882352941176,
67
+ "loss": 2.4435,
68
+ "step": 5000
69
+ },
70
+ {
71
+ "epoch": 0.03,
72
+ "learning_rate": 0.0009676470588235295,
73
+ "loss": 2.3768,
74
+ "step": 5500
75
+ },
76
+ {
77
+ "epoch": 0.04,
78
+ "learning_rate": 0.0009647058823529412,
79
+ "loss": 2.382,
80
+ "step": 6000
81
+ },
82
+ {
83
+ "epoch": 0.04,
84
+ "learning_rate": 0.0009617647058823529,
85
+ "loss": 2.4059,
86
+ "step": 6500
87
+ },
88
+ {
89
+ "epoch": 0.04,
90
+ "learning_rate": 0.0009588235294117648,
91
+ "loss": 2.3918,
92
+ "step": 7000
93
+ },
94
+ {
95
+ "epoch": 0.04,
96
+ "learning_rate": 0.0009558823529411765,
97
+ "loss": 2.3325,
98
+ "step": 7500
99
+ },
100
+ {
101
+ "epoch": 0.05,
102
+ "learning_rate": 0.0009529411764705882,
103
+ "loss": 2.3217,
104
+ "step": 8000
105
+ },
106
+ {
107
+ "epoch": 0.05,
108
+ "learning_rate": 0.00095,
109
+ "loss": 2.2997,
110
+ "step": 8500
111
+ },
112
+ {
113
+ "epoch": 0.05,
114
+ "learning_rate": 0.0009470588235294117,
115
+ "loss": 2.2969,
116
+ "step": 9000
117
+ },
118
+ {
119
+ "epoch": 0.06,
120
+ "learning_rate": 0.0009441176470588235,
121
+ "loss": 2.3028,
122
+ "step": 9500
123
+ },
124
+ {
125
+ "epoch": 0.06,
126
+ "learning_rate": 0.0009411764705882353,
127
+ "loss": 2.2885,
128
+ "step": 10000
129
+ },
130
+ {
131
+ "epoch": 0.06,
132
+ "learning_rate": 0.0009382352941176471,
133
+ "loss": 2.3032,
134
+ "step": 10500
135
+ },
136
+ {
137
+ "epoch": 0.06,
138
+ "learning_rate": 0.0009352941176470589,
139
+ "loss": 2.2727,
140
+ "step": 11000
141
+ },
142
+ {
143
+ "epoch": 0.07,
144
+ "learning_rate": 0.0009323529411764706,
145
+ "loss": 2.2315,
146
+ "step": 11500
147
+ },
148
+ {
149
+ "epoch": 0.07,
150
+ "learning_rate": 0.0009294117647058824,
151
+ "loss": 2.2389,
152
+ "step": 12000
153
+ },
154
+ {
155
+ "epoch": 0.07,
156
+ "learning_rate": 0.0009264705882352942,
157
+ "loss": 2.2585,
158
+ "step": 12500
159
+ },
160
+ {
161
+ "epoch": 0.08,
162
+ "learning_rate": 0.000923529411764706,
163
+ "loss": 2.2445,
164
+ "step": 13000
165
+ },
166
+ {
167
+ "epoch": 0.08,
168
+ "learning_rate": 0.0009205882352941176,
169
+ "loss": 2.1763,
170
+ "step": 13500
171
+ },
172
+ {
173
+ "epoch": 0.08,
174
+ "learning_rate": 0.0009176470588235294,
175
+ "loss": 2.2386,
176
+ "step": 14000
177
+ },
178
+ {
179
+ "epoch": 0.09,
180
+ "learning_rate": 0.0009147058823529412,
181
+ "loss": 2.2169,
182
+ "step": 14500
183
+ },
184
+ {
185
+ "epoch": 0.09,
186
+ "learning_rate": 0.0009117647058823529,
187
+ "loss": 2.2082,
188
+ "step": 15000
189
+ },
190
+ {
191
+ "epoch": 0.09,
192
+ "learning_rate": 0.0009088235294117648,
193
+ "loss": 2.2337,
194
+ "step": 15500
195
+ },
196
+ {
197
+ "epoch": 0.09,
198
+ "learning_rate": 0.0009058823529411765,
199
+ "loss": 2.1778,
200
+ "step": 16000
201
+ },
202
+ {
203
+ "epoch": 0.1,
204
+ "learning_rate": 0.0009029411764705882,
205
+ "loss": 2.2025,
206
+ "step": 16500
207
+ },
208
+ {
209
+ "epoch": 0.1,
210
+ "learning_rate": 0.0009000000000000001,
211
+ "loss": 2.1734,
212
+ "step": 17000
213
+ },
214
+ {
215
+ "epoch": 0.1,
216
+ "learning_rate": 0.0008970588235294118,
217
+ "loss": 2.1567,
218
+ "step": 17500
219
+ },
220
+ {
221
+ "epoch": 0.11,
222
+ "learning_rate": 0.0008941176470588236,
223
+ "loss": 2.1574,
224
+ "step": 18000
225
+ },
226
+ {
227
+ "epoch": 0.11,
228
+ "learning_rate": 0.0008911764705882354,
229
+ "loss": 2.1401,
230
+ "step": 18500
231
+ },
232
+ {
233
+ "epoch": 0.11,
234
+ "learning_rate": 0.000888235294117647,
235
+ "loss": 2.1853,
236
+ "step": 19000
237
+ },
238
+ {
239
+ "epoch": 0.11,
240
+ "learning_rate": 0.0008852941176470588,
241
+ "loss": 2.1417,
242
+ "step": 19500
243
+ },
244
+ {
245
+ "epoch": 0.12,
246
+ "learning_rate": 0.0008823529411764706,
247
+ "loss": 2.1332,
248
+ "step": 20000
249
+ },
250
+ {
251
+ "epoch": 0.12,
252
+ "learning_rate": 0.0008794117647058824,
253
+ "loss": 2.12,
254
+ "step": 20500
255
+ },
256
+ {
257
+ "epoch": 0.12,
258
+ "learning_rate": 0.0008764705882352941,
259
+ "loss": 2.1348,
260
+ "step": 21000
261
+ },
262
+ {
263
+ "epoch": 0.13,
264
+ "learning_rate": 0.0008735294117647059,
265
+ "loss": 2.1464,
266
+ "step": 21500
267
+ },
268
+ {
269
+ "epoch": 0.13,
270
+ "learning_rate": 0.0008705882352941177,
271
+ "loss": 2.1532,
272
+ "step": 22000
273
+ },
274
+ {
275
+ "epoch": 0.13,
276
+ "learning_rate": 0.0008676470588235294,
277
+ "loss": 2.1142,
278
+ "step": 22500
279
+ },
280
+ {
281
+ "epoch": 0.14,
282
+ "learning_rate": 0.0008647058823529413,
283
+ "loss": 2.0977,
284
+ "step": 23000
285
+ },
286
+ {
287
+ "epoch": 0.14,
288
+ "learning_rate": 0.000861764705882353,
289
+ "loss": 2.1114,
290
+ "step": 23500
291
+ },
292
+ {
293
+ "epoch": 0.14,
294
+ "learning_rate": 0.0008588235294117646,
295
+ "loss": 2.079,
296
+ "step": 24000
297
+ },
298
+ {
299
+ "epoch": 0.14,
300
+ "learning_rate": 0.0008558823529411765,
301
+ "loss": 2.1021,
302
+ "step": 24500
303
+ },
304
+ {
305
+ "epoch": 0.15,
306
+ "learning_rate": 0.0008529411764705882,
307
+ "loss": 2.1075,
308
+ "step": 25000
309
+ },
310
+ {
311
+ "epoch": 0.15,
312
+ "learning_rate": 0.00085,
313
+ "loss": 2.0557,
314
+ "step": 25500
315
+ },
316
+ {
317
+ "epoch": 0.15,
318
+ "learning_rate": 0.0008470588235294118,
319
+ "loss": 2.0445,
320
+ "step": 26000
321
+ },
322
+ {
323
+ "epoch": 0.16,
324
+ "learning_rate": 0.0008441176470588235,
325
+ "loss": 2.072,
326
+ "step": 26500
327
+ },
328
+ {
329
+ "epoch": 0.16,
330
+ "learning_rate": 0.0008411764705882353,
331
+ "loss": 2.0684,
332
+ "step": 27000
333
+ },
334
+ {
335
+ "epoch": 0.16,
336
+ "learning_rate": 0.0008382352941176471,
337
+ "loss": 2.06,
338
+ "step": 27500
339
+ },
340
+ {
341
+ "epoch": 0.16,
342
+ "learning_rate": 0.0008352941176470589,
343
+ "loss": 2.0768,
344
+ "step": 28000
345
+ },
346
+ {
347
+ "epoch": 0.17,
348
+ "learning_rate": 0.0008323529411764706,
349
+ "loss": 2.1055,
350
+ "step": 28500
351
+ },
352
+ {
353
+ "epoch": 0.17,
354
+ "learning_rate": 0.0008294117647058824,
355
+ "loss": 2.0504,
356
+ "step": 29000
357
+ },
358
+ {
359
+ "epoch": 0.17,
360
+ "learning_rate": 0.0008264705882352941,
361
+ "loss": 2.0279,
362
+ "step": 29500
363
+ },
364
+ {
365
+ "epoch": 0.18,
366
+ "learning_rate": 0.0008235294117647058,
367
+ "loss": 2.0852,
368
+ "step": 30000
369
+ },
370
+ {
371
+ "epoch": 0.18,
372
+ "learning_rate": 0.0008205882352941177,
373
+ "loss": 2.0934,
374
+ "step": 30500
375
+ },
376
+ {
377
+ "epoch": 0.18,
378
+ "learning_rate": 0.0008176470588235294,
379
+ "loss": 2.0453,
380
+ "step": 31000
381
+ },
382
+ {
383
+ "epoch": 0.19,
384
+ "learning_rate": 0.0008147058823529411,
385
+ "loss": 2.0469,
386
+ "step": 31500
387
+ },
388
+ {
389
+ "epoch": 0.19,
390
+ "learning_rate": 0.000811764705882353,
391
+ "loss": 2.6994,
392
+ "step": 32000
393
+ },
394
+ {
395
+ "epoch": 0.19,
396
+ "learning_rate": 0.0008088235294117647,
397
+ "loss": 2.0459,
398
+ "step": 32500
399
+ },
400
+ {
401
+ "epoch": 0.19,
402
+ "learning_rate": 0.0008058823529411766,
403
+ "loss": 2.0367,
404
+ "step": 33000
405
+ },
406
+ {
407
+ "epoch": 0.2,
408
+ "learning_rate": 0.0008029411764705883,
409
+ "loss": 2.0672,
410
+ "step": 33500
411
+ },
412
+ {
413
+ "epoch": 0.2,
414
+ "learning_rate": 0.0008,
415
+ "loss": 2.0765,
416
+ "step": 34000
417
+ },
418
+ {
419
+ "epoch": 0.2,
420
+ "learning_rate": 0.0007970588235294119,
421
+ "loss": 2.0736,
422
+ "step": 34500
423
+ },
424
+ {
425
+ "epoch": 0.21,
426
+ "learning_rate": 0.0007941176470588235,
427
+ "loss": 2.0491,
428
+ "step": 35000
429
+ },
430
+ {
431
+ "epoch": 0.21,
432
+ "learning_rate": 0.0007911764705882353,
433
+ "loss": 2.0626,
434
+ "step": 35500
435
+ },
436
+ {
437
+ "epoch": 0.21,
438
+ "learning_rate": 0.0007882352941176471,
439
+ "loss": 2.034,
440
+ "step": 36000
441
+ },
442
+ {
443
+ "epoch": 0.21,
444
+ "learning_rate": 0.0007852941176470588,
445
+ "loss": 2.0163,
446
+ "step": 36500
447
+ },
448
+ {
449
+ "epoch": 0.22,
450
+ "learning_rate": 0.0007823529411764706,
451
+ "loss": 2.0318,
452
+ "step": 37000
453
+ },
454
+ {
455
+ "epoch": 0.22,
456
+ "learning_rate": 0.0007794117647058824,
457
+ "loss": 2.0477,
458
+ "step": 37500
459
+ },
460
+ {
461
+ "epoch": 0.22,
462
+ "learning_rate": 0.0007764705882352942,
463
+ "loss": 2.0535,
464
+ "step": 38000
465
+ },
466
+ {
467
+ "epoch": 0.23,
468
+ "learning_rate": 0.0007735294117647059,
469
+ "loss": 1.9894,
470
+ "step": 38500
471
+ },
472
+ {
473
+ "epoch": 0.23,
474
+ "learning_rate": 0.0007705882352941177,
475
+ "loss": 2.0341,
476
+ "step": 39000
477
+ },
478
+ {
479
+ "epoch": 0.23,
480
+ "learning_rate": 0.0007676470588235295,
481
+ "loss": 1.9852,
482
+ "step": 39500
483
+ },
484
+ {
485
+ "epoch": 0.24,
486
+ "learning_rate": 0.0007647058823529411,
487
+ "loss": 1.9756,
488
+ "step": 40000
489
+ },
490
+ {
491
+ "epoch": 0.24,
492
+ "learning_rate": 0.000761764705882353,
493
+ "loss": 2.0002,
494
+ "step": 40500
495
+ },
496
+ {
497
+ "epoch": 0.24,
498
+ "learning_rate": 0.0007588235294117647,
499
+ "loss": 2.0056,
500
+ "step": 41000
501
+ },
502
+ {
503
+ "epoch": 0.24,
504
+ "learning_rate": 0.0007558823529411764,
505
+ "loss": 2.0109,
506
+ "step": 41500
507
+ },
508
+ {
509
+ "epoch": 0.25,
510
+ "learning_rate": 0.0007529411764705883,
511
+ "loss": 2.0476,
512
+ "step": 42000
513
+ },
514
+ {
515
+ "epoch": 0.25,
516
+ "learning_rate": 0.00075,
517
+ "loss": 2.0153,
518
+ "step": 42500
519
+ },
520
+ {
521
+ "epoch": 0.25,
522
+ "learning_rate": 0.0007470588235294118,
523
+ "loss": 1.983,
524
+ "step": 43000
525
+ },
526
+ {
527
+ "epoch": 0.26,
528
+ "learning_rate": 0.0007441176470588236,
529
+ "loss": 1.9831,
530
+ "step": 43500
531
+ },
532
+ {
533
+ "epoch": 0.26,
534
+ "learning_rate": 0.0007411764705882353,
535
+ "loss": 1.9843,
536
+ "step": 44000
537
+ },
538
+ {
539
+ "epoch": 0.26,
540
+ "learning_rate": 0.0007382352941176471,
541
+ "loss": 1.9926,
542
+ "step": 44500
543
+ },
544
+ {
545
+ "epoch": 0.26,
546
+ "learning_rate": 0.0007352941176470589,
547
+ "loss": 1.9853,
548
+ "step": 45000
549
+ },
550
+ {
551
+ "epoch": 0.27,
552
+ "learning_rate": 0.0007323529411764706,
553
+ "loss": 1.9806,
554
+ "step": 45500
555
+ },
556
+ {
557
+ "epoch": 0.27,
558
+ "learning_rate": 0.0007294117647058823,
559
+ "loss": 1.9876,
560
+ "step": 46000
561
+ },
562
+ {
563
+ "epoch": 0.27,
564
+ "learning_rate": 0.0007264705882352941,
565
+ "loss": 1.9705,
566
+ "step": 46500
567
+ },
568
+ {
569
+ "epoch": 0.28,
570
+ "learning_rate": 0.0007235294117647059,
571
+ "loss": 1.9724,
572
+ "step": 47000
573
+ },
574
+ {
575
+ "epoch": 0.28,
576
+ "learning_rate": 0.0007205882352941176,
577
+ "loss": 2.0018,
578
+ "step": 47500
579
+ },
580
+ {
581
+ "epoch": 0.28,
582
+ "learning_rate": 0.0007176470588235295,
583
+ "loss": 1.9356,
584
+ "step": 48000
585
+ },
586
+ {
587
+ "epoch": 0.29,
588
+ "learning_rate": 0.0007147058823529412,
589
+ "loss": 1.966,
590
+ "step": 48500
591
+ },
592
+ {
593
+ "epoch": 0.29,
594
+ "learning_rate": 0.0007117647058823529,
595
+ "loss": 1.9758,
596
+ "step": 49000
597
+ },
598
+ {
599
+ "epoch": 0.29,
600
+ "learning_rate": 0.0007088235294117648,
601
+ "loss": 1.9851,
602
+ "step": 49500
603
+ },
604
+ {
605
+ "epoch": 0.29,
606
+ "learning_rate": 0.0007058823529411765,
607
+ "loss": 1.9648,
608
+ "step": 50000
609
+ },
610
+ {
611
+ "epoch": 0.3,
612
+ "learning_rate": 0.0007029411764705881,
613
+ "loss": 1.9601,
614
+ "step": 50500
615
+ },
616
+ {
617
+ "epoch": 0.3,
618
+ "learning_rate": 0.0007,
619
+ "loss": 1.9646,
620
+ "step": 51000
621
+ },
622
+ {
623
+ "epoch": 0.3,
624
+ "learning_rate": 0.0006970588235294117,
625
+ "loss": 1.9551,
626
+ "step": 51500
627
+ },
628
+ {
629
+ "epoch": 0.31,
630
+ "learning_rate": 0.0006941176470588235,
631
+ "loss": 1.9455,
632
+ "step": 52000
633
+ },
634
+ {
635
+ "epoch": 0.31,
636
+ "learning_rate": 0.0006911764705882353,
637
+ "loss": 1.9527,
638
+ "step": 52500
639
+ },
640
+ {
641
+ "epoch": 0.31,
642
+ "learning_rate": 0.000688235294117647,
643
+ "loss": 1.9583,
644
+ "step": 53000
645
+ },
646
+ {
647
+ "epoch": 0.31,
648
+ "learning_rate": 0.0006852941176470589,
649
+ "loss": 1.9525,
650
+ "step": 53500
651
+ },
652
+ {
653
+ "epoch": 0.32,
654
+ "learning_rate": 0.0006823529411764706,
655
+ "loss": 1.9276,
656
+ "step": 54000
657
+ },
658
+ {
659
+ "epoch": 0.32,
660
+ "learning_rate": 0.0006794117647058824,
661
+ "loss": 1.9626,
662
+ "step": 54500
663
+ },
664
+ {
665
+ "epoch": 0.32,
666
+ "learning_rate": 0.0006764705882352942,
667
+ "loss": 1.8995,
668
+ "step": 55000
669
+ },
670
+ {
671
+ "epoch": 0.33,
672
+ "learning_rate": 0.000673529411764706,
673
+ "loss": 1.9457,
674
+ "step": 55500
675
+ },
676
+ {
677
+ "epoch": 0.33,
678
+ "learning_rate": 0.0006705882352941176,
679
+ "loss": 1.9706,
680
+ "step": 56000
681
+ },
682
+ {
683
+ "epoch": 0.33,
684
+ "learning_rate": 0.0006676470588235294,
685
+ "loss": 1.9671,
686
+ "step": 56500
687
+ },
688
+ {
689
+ "epoch": 0.34,
690
+ "learning_rate": 0.0006647058823529412,
691
+ "loss": 1.9613,
692
+ "step": 57000
693
+ },
694
+ {
695
+ "epoch": 0.34,
696
+ "learning_rate": 0.0006617647058823529,
697
+ "loss": 1.9511,
698
+ "step": 57500
699
+ },
700
+ {
701
+ "epoch": 0.34,
702
+ "learning_rate": 0.0006588235294117648,
703
+ "loss": 1.9437,
704
+ "step": 58000
705
+ },
706
+ {
707
+ "epoch": 0.34,
708
+ "learning_rate": 0.0006558823529411765,
709
+ "loss": 1.9381,
710
+ "step": 58500
711
+ },
712
+ {
713
+ "epoch": 0.35,
714
+ "learning_rate": 0.0006529411764705882,
715
+ "loss": 1.9205,
716
+ "step": 59000
717
+ },
718
+ {
719
+ "epoch": 0.35,
720
+ "learning_rate": 0.0006500000000000001,
721
+ "loss": 1.9415,
722
+ "step": 59500
723
+ },
724
+ {
725
+ "epoch": 0.35,
726
+ "learning_rate": 0.0006470588235294118,
727
+ "loss": 1.9307,
728
+ "step": 60000
729
+ },
730
+ {
731
+ "epoch": 0.36,
732
+ "learning_rate": 0.0006441176470588236,
733
+ "loss": 1.9616,
734
+ "step": 60500
735
+ },
736
+ {
737
+ "epoch": 0.36,
738
+ "learning_rate": 0.0006411764705882354,
739
+ "loss": 1.9267,
740
+ "step": 61000
741
+ },
742
+ {
743
+ "epoch": 0.36,
744
+ "learning_rate": 0.000638235294117647,
745
+ "loss": 1.911,
746
+ "step": 61500
747
+ },
748
+ {
749
+ "epoch": 0.36,
750
+ "learning_rate": 0.0006352941176470588,
751
+ "loss": 1.9316,
752
+ "step": 62000
753
+ },
754
+ {
755
+ "epoch": 0.37,
756
+ "learning_rate": 0.0006323529411764706,
757
+ "loss": 1.9174,
758
+ "step": 62500
759
+ },
760
+ {
761
+ "epoch": 0.37,
762
+ "learning_rate": 0.0006294117647058824,
763
+ "loss": 1.9185,
764
+ "step": 63000
765
+ },
766
+ {
767
+ "epoch": 0.37,
768
+ "learning_rate": 0.0006264705882352941,
769
+ "loss": 1.9067,
770
+ "step": 63500
771
+ },
772
+ {
773
+ "epoch": 0.38,
774
+ "learning_rate": 0.0006235294117647059,
775
+ "loss": 1.9071,
776
+ "step": 64000
777
+ },
778
+ {
779
+ "epoch": 0.38,
780
+ "learning_rate": 0.0006205882352941177,
781
+ "loss": 1.9217,
782
+ "step": 64500
783
+ },
784
+ {
785
+ "epoch": 0.38,
786
+ "learning_rate": 0.0006176470588235294,
787
+ "loss": 1.9262,
788
+ "step": 65000
789
+ },
790
+ {
791
+ "epoch": 0.39,
792
+ "learning_rate": 0.0006147058823529413,
793
+ "loss": 1.9182,
794
+ "step": 65500
795
+ },
796
+ {
797
+ "epoch": 0.39,
798
+ "learning_rate": 0.000611764705882353,
799
+ "loss": 1.9291,
800
+ "step": 66000
801
+ },
802
+ {
803
+ "epoch": 0.39,
804
+ "learning_rate": 0.0006088235294117646,
805
+ "loss": 1.9115,
806
+ "step": 66500
807
+ },
808
+ {
809
+ "epoch": 0.39,
810
+ "learning_rate": 0.0006058823529411765,
811
+ "loss": 1.8964,
812
+ "step": 67000
813
+ },
814
+ {
815
+ "epoch": 0.4,
816
+ "learning_rate": 0.0006029411764705882,
817
+ "loss": 1.9267,
818
+ "step": 67500
819
+ },
820
+ {
821
+ "epoch": 0.4,
822
+ "learning_rate": 0.0006,
823
+ "loss": 1.8993,
824
+ "step": 68000
825
+ },
826
+ {
827
+ "epoch": 0.4,
828
+ "learning_rate": 0.0005970588235294118,
829
+ "loss": 1.9361,
830
+ "step": 68500
831
+ },
832
+ {
833
+ "epoch": 0.41,
834
+ "learning_rate": 0.0005941176470588235,
835
+ "loss": 1.8742,
836
+ "step": 69000
837
+ },
838
+ {
839
+ "epoch": 0.41,
840
+ "learning_rate": 0.0005911764705882353,
841
+ "loss": 1.9027,
842
+ "step": 69500
843
+ },
844
+ {
845
+ "epoch": 0.41,
846
+ "learning_rate": 0.0005882352941176471,
847
+ "loss": 1.9098,
848
+ "step": 70000
849
+ },
850
+ {
851
+ "epoch": 0.41,
852
+ "learning_rate": 0.0005852941176470589,
853
+ "loss": 1.8969,
854
+ "step": 70500
855
+ },
856
+ {
857
+ "epoch": 0.42,
858
+ "learning_rate": 0.0005823529411764706,
859
+ "loss": 1.9119,
860
+ "step": 71000
861
+ },
862
+ {
863
+ "epoch": 0.42,
864
+ "learning_rate": 0.0005794117647058824,
865
+ "loss": 1.9021,
866
+ "step": 71500
867
+ },
868
+ {
869
+ "epoch": 0.42,
870
+ "learning_rate": 0.0005764705882352941,
871
+ "loss": 1.8577,
872
+ "step": 72000
873
+ },
874
+ {
875
+ "epoch": 0.43,
876
+ "learning_rate": 0.0005735294117647058,
877
+ "loss": 1.892,
878
+ "step": 72500
879
+ },
880
+ {
881
+ "epoch": 0.43,
882
+ "learning_rate": 0.0005705882352941177,
883
+ "loss": 1.8874,
884
+ "step": 73000
885
+ },
886
+ {
887
+ "epoch": 0.43,
888
+ "learning_rate": 0.0005676470588235294,
889
+ "loss": 1.8978,
890
+ "step": 73500
891
+ },
892
+ {
893
+ "epoch": 0.44,
894
+ "learning_rate": 0.0005647058823529411,
895
+ "loss": 1.8824,
896
+ "step": 74000
897
+ },
898
+ {
899
+ "epoch": 0.44,
900
+ "learning_rate": 0.000561764705882353,
901
+ "loss": 1.9075,
902
+ "step": 74500
903
+ },
904
+ {
905
+ "epoch": 0.44,
906
+ "learning_rate": 0.0005588235294117647,
907
+ "loss": 1.891,
908
+ "step": 75000
909
+ },
910
+ {
911
+ "epoch": 0.44,
912
+ "learning_rate": 0.0005558823529411766,
913
+ "loss": 1.8859,
914
+ "step": 75500
915
+ },
916
+ {
917
+ "epoch": 0.45,
918
+ "learning_rate": 0.0005529411764705883,
919
+ "loss": 1.8606,
920
+ "step": 76000
921
+ },
922
+ {
923
+ "epoch": 0.45,
924
+ "learning_rate": 0.00055,
925
+ "loss": 1.9076,
926
+ "step": 76500
927
+ },
928
+ {
929
+ "epoch": 0.45,
930
+ "learning_rate": 0.0005470588235294119,
931
+ "loss": 1.8524,
932
+ "step": 77000
933
+ },
934
+ {
935
+ "epoch": 0.46,
936
+ "learning_rate": 0.0005441176470588235,
937
+ "loss": 1.8389,
938
+ "step": 77500
939
+ },
940
+ {
941
+ "epoch": 0.46,
942
+ "learning_rate": 0.0005411764705882352,
943
+ "loss": 1.8674,
944
+ "step": 78000
945
+ },
946
+ {
947
+ "epoch": 0.46,
948
+ "learning_rate": 0.0005382352941176471,
949
+ "loss": 1.9254,
950
+ "step": 78500
951
+ },
952
+ {
953
+ "epoch": 0.46,
954
+ "learning_rate": 0.0005352941176470588,
955
+ "loss": 1.9068,
956
+ "step": 79000
957
+ },
958
+ {
959
+ "epoch": 0.47,
960
+ "learning_rate": 0.0005323529411764706,
961
+ "loss": 1.8746,
962
+ "step": 79500
963
+ },
964
+ {
965
+ "epoch": 0.47,
966
+ "learning_rate": 0.0005294117647058824,
967
+ "loss": 1.8625,
968
+ "step": 80000
969
+ },
970
+ {
971
+ "epoch": 0.47,
972
+ "learning_rate": 0.0005264705882352942,
973
+ "loss": 1.7992,
974
+ "step": 80500
975
+ },
976
+ {
977
+ "epoch": 0.48,
978
+ "learning_rate": 0.0005235294117647059,
979
+ "loss": 1.8703,
980
+ "step": 81000
981
+ },
982
+ {
983
+ "epoch": 0.48,
984
+ "learning_rate": 0.0005205882352941177,
985
+ "loss": 1.8678,
986
+ "step": 81500
987
+ },
988
+ {
989
+ "epoch": 0.48,
990
+ "learning_rate": 0.0005176470588235295,
991
+ "loss": 1.8611,
992
+ "step": 82000
993
+ },
994
+ {
995
+ "epoch": 0.49,
996
+ "learning_rate": 0.0005147058823529411,
997
+ "loss": 1.8053,
998
+ "step": 82500
999
+ },
1000
+ {
1001
+ "epoch": 0.49,
1002
+ "learning_rate": 0.000511764705882353,
1003
+ "loss": 1.8697,
1004
+ "step": 83000
1005
+ },
1006
+ {
1007
+ "epoch": 0.49,
1008
+ "learning_rate": 0.0005088235294117647,
1009
+ "loss": 1.8965,
1010
+ "step": 83500
1011
+ },
1012
+ {
1013
+ "epoch": 0.49,
1014
+ "learning_rate": 0.0005058823529411764,
1015
+ "loss": 1.8519,
1016
+ "step": 84000
1017
+ },
1018
+ {
1019
+ "epoch": 0.5,
1020
+ "learning_rate": 0.0005029411764705883,
1021
+ "loss": 1.8454,
1022
+ "step": 84500
1023
+ },
1024
+ {
1025
+ "epoch": 0.5,
1026
+ "learning_rate": 0.0005,
1027
+ "loss": 1.8719,
1028
+ "step": 85000
1029
+ },
1030
+ {
1031
+ "epoch": 0.5,
1032
+ "learning_rate": 0.0004970588235294117,
1033
+ "loss": 1.876,
1034
+ "step": 85500
1035
+ },
1036
+ {
1037
+ "epoch": 0.51,
1038
+ "learning_rate": 0.0004941176470588236,
1039
+ "loss": 1.8438,
1040
+ "step": 86000
1041
+ },
1042
+ {
1043
+ "epoch": 0.51,
1044
+ "learning_rate": 0.0004911764705882353,
1045
+ "loss": 1.8712,
1046
+ "step": 86500
1047
+ },
1048
+ {
1049
+ "epoch": 0.51,
1050
+ "learning_rate": 0.00048823529411764707,
1051
+ "loss": 1.8232,
1052
+ "step": 87000
1053
+ },
1054
+ {
1055
+ "epoch": 0.51,
1056
+ "learning_rate": 0.0004852941176470588,
1057
+ "loss": 1.8492,
1058
+ "step": 87500
1059
+ },
1060
+ {
1061
+ "epoch": 0.52,
1062
+ "learning_rate": 0.0004823529411764706,
1063
+ "loss": 1.8605,
1064
+ "step": 88000
1065
+ },
1066
+ {
1067
+ "epoch": 0.52,
1068
+ "learning_rate": 0.0004794117647058824,
1069
+ "loss": 1.8502,
1070
+ "step": 88500
1071
+ },
1072
+ {
1073
+ "epoch": 0.52,
1074
+ "learning_rate": 0.0004764705882352941,
1075
+ "loss": 1.8803,
1076
+ "step": 89000
1077
+ },
1078
+ {
1079
+ "epoch": 0.53,
1080
+ "learning_rate": 0.00047352941176470587,
1081
+ "loss": 1.8466,
1082
+ "step": 89500
1083
+ },
1084
+ {
1085
+ "epoch": 0.53,
1086
+ "learning_rate": 0.00047058823529411766,
1087
+ "loss": 1.8823,
1088
+ "step": 90000
1089
+ },
1090
+ {
1091
+ "epoch": 0.53,
1092
+ "learning_rate": 0.00046764705882352945,
1093
+ "loss": 1.8624,
1094
+ "step": 90500
1095
+ },
1096
+ {
1097
+ "epoch": 0.54,
1098
+ "learning_rate": 0.0004647058823529412,
1099
+ "loss": 1.8055,
1100
+ "step": 91000
1101
+ },
1102
+ {
1103
+ "epoch": 0.54,
1104
+ "learning_rate": 0.000461764705882353,
1105
+ "loss": 1.8456,
1106
+ "step": 91500
1107
+ },
1108
+ {
1109
+ "epoch": 0.54,
1110
+ "learning_rate": 0.0004588235294117647,
1111
+ "loss": 1.8181,
1112
+ "step": 92000
1113
+ },
1114
+ {
1115
+ "epoch": 0.54,
1116
+ "learning_rate": 0.00045588235294117646,
1117
+ "loss": 1.8324,
1118
+ "step": 92500
1119
+ },
1120
+ {
1121
+ "epoch": 0.55,
1122
+ "learning_rate": 0.00045294117647058825,
1123
+ "loss": 1.8439,
1124
+ "step": 93000
1125
+ },
1126
+ {
1127
+ "epoch": 0.55,
1128
+ "learning_rate": 0.00045000000000000004,
1129
+ "loss": 1.8487,
1130
+ "step": 93500
1131
+ },
1132
+ {
1133
+ "epoch": 0.55,
1134
+ "learning_rate": 0.0004470588235294118,
1135
+ "loss": 1.8355,
1136
+ "step": 94000
1137
+ },
1138
+ {
1139
+ "epoch": 0.56,
1140
+ "learning_rate": 0.0004441176470588235,
1141
+ "loss": 1.8536,
1142
+ "step": 94500
1143
+ },
1144
+ {
1145
+ "epoch": 0.56,
1146
+ "learning_rate": 0.0004411764705882353,
1147
+ "loss": 1.8101,
1148
+ "step": 95000
1149
+ },
1150
+ {
1151
+ "epoch": 0.56,
1152
+ "learning_rate": 0.00043823529411764705,
1153
+ "loss": 1.7965,
1154
+ "step": 95500
1155
+ },
1156
+ {
1157
+ "epoch": 0.56,
1158
+ "learning_rate": 0.00043529411764705884,
1159
+ "loss": 1.8239,
1160
+ "step": 96000
1161
+ },
1162
+ {
1163
+ "epoch": 0.57,
1164
+ "learning_rate": 0.00043235294117647063,
1165
+ "loss": 1.8155,
1166
+ "step": 96500
1167
+ },
1168
+ {
1169
+ "epoch": 0.57,
1170
+ "learning_rate": 0.0004294117647058823,
1171
+ "loss": 1.831,
1172
+ "step": 97000
1173
+ },
1174
+ {
1175
+ "epoch": 0.57,
1176
+ "learning_rate": 0.0004264705882352941,
1177
+ "loss": 1.8305,
1178
+ "step": 97500
1179
+ },
1180
+ {
1181
+ "epoch": 0.58,
1182
+ "learning_rate": 0.0004235294117647059,
1183
+ "loss": 1.824,
1184
+ "step": 98000
1185
+ },
1186
+ {
1187
+ "epoch": 0.58,
1188
+ "learning_rate": 0.00042058823529411764,
1189
+ "loss": 1.8162,
1190
+ "step": 98500
1191
+ },
1192
+ {
1193
+ "epoch": 0.58,
1194
+ "learning_rate": 0.00041764705882352943,
1195
+ "loss": 1.8092,
1196
+ "step": 99000
1197
+ },
1198
+ {
1199
+ "epoch": 0.59,
1200
+ "learning_rate": 0.0004147058823529412,
1201
+ "loss": 1.7905,
1202
+ "step": 99500
1203
+ },
1204
+ {
1205
+ "epoch": 0.59,
1206
+ "learning_rate": 0.0004117647058823529,
1207
+ "loss": 1.8432,
1208
+ "step": 100000
1209
+ },
1210
+ {
1211
+ "epoch": 0.59,
1212
+ "eval_gen_len": 85.6,
1213
+ "eval_loss": 1.6427801847457886,
1214
+ "eval_rouge1": 29.1215,
1215
+ "eval_rouge2": 10.0431,
1216
+ "eval_rougeL": 20.203,
1217
+ "eval_rougeLsum": 24.5992,
1218
+ "eval_runtime": 107.8904,
1219
+ "eval_samples_per_second": 0.927,
1220
+ "step": 100000
1221
+ },
1222
+ {
1223
+ "epoch": 0.59,
1224
+ "learning_rate": 0.0004088235294117647,
1225
+ "loss": 1.8176,
1226
+ "step": 100500
1227
+ },
1228
+ {
1229
+ "epoch": 0.59,
1230
+ "learning_rate": 0.0004058823529411765,
1231
+ "loss": 1.8058,
1232
+ "step": 101000
1233
+ },
1234
+ {
1235
+ "epoch": 0.6,
1236
+ "learning_rate": 0.0004029411764705883,
1237
+ "loss": 1.7967,
1238
+ "step": 101500
1239
+ },
1240
+ {
1241
+ "epoch": 0.6,
1242
+ "learning_rate": 0.0004,
1243
+ "loss": 1.7941,
1244
+ "step": 102000
1245
+ },
1246
+ {
1247
+ "epoch": 0.6,
1248
+ "learning_rate": 0.00039705882352941176,
1249
+ "loss": 1.8045,
1250
+ "step": 102500
1251
+ },
1252
+ {
1253
+ "epoch": 0.61,
1254
+ "learning_rate": 0.00039411764705882355,
1255
+ "loss": 1.7988,
1256
+ "step": 103000
1257
+ },
1258
+ {
1259
+ "epoch": 0.61,
1260
+ "learning_rate": 0.0003911764705882353,
1261
+ "loss": 1.8047,
1262
+ "step": 103500
1263
+ },
1264
+ {
1265
+ "epoch": 0.61,
1266
+ "learning_rate": 0.0003882352941176471,
1267
+ "loss": 1.817,
1268
+ "step": 104000
1269
+ },
1270
+ {
1271
+ "epoch": 0.61,
1272
+ "learning_rate": 0.00038529411764705887,
1273
+ "loss": 1.8081,
1274
+ "step": 104500
1275
+ },
1276
+ {
1277
+ "epoch": 0.62,
1278
+ "learning_rate": 0.00038235294117647055,
1279
+ "loss": 1.7954,
1280
+ "step": 105000
1281
+ },
1282
+ {
1283
+ "epoch": 0.62,
1284
+ "learning_rate": 0.00037941176470588235,
1285
+ "loss": 1.8404,
1286
+ "step": 105500
1287
+ },
1288
+ {
1289
+ "epoch": 0.62,
1290
+ "learning_rate": 0.00037647058823529414,
1291
+ "loss": 1.815,
1292
+ "step": 106000
1293
+ },
1294
+ {
1295
+ "epoch": 0.63,
1296
+ "learning_rate": 0.0003735294117647059,
1297
+ "loss": 1.8023,
1298
+ "step": 106500
1299
+ },
1300
+ {
1301
+ "epoch": 0.63,
1302
+ "learning_rate": 0.00037058823529411767,
1303
+ "loss": 1.7957,
1304
+ "step": 107000
1305
+ },
1306
+ {
1307
+ "epoch": 0.63,
1308
+ "learning_rate": 0.00036764705882352946,
1309
+ "loss": 1.7989,
1310
+ "step": 107500
1311
+ },
1312
+ {
1313
+ "epoch": 0.64,
1314
+ "learning_rate": 0.00036470588235294114,
1315
+ "loss": 1.8369,
1316
+ "step": 108000
1317
+ },
1318
+ {
1319
+ "epoch": 0.64,
1320
+ "learning_rate": 0.00036176470588235294,
1321
+ "loss": 1.816,
1322
+ "step": 108500
1323
+ },
1324
+ {
1325
+ "epoch": 0.64,
1326
+ "learning_rate": 0.00035882352941176473,
1327
+ "loss": 1.7874,
1328
+ "step": 109000
1329
+ },
1330
+ {
1331
+ "epoch": 0.64,
1332
+ "learning_rate": 0.00035588235294117647,
1333
+ "loss": 1.7832,
1334
+ "step": 109500
1335
+ },
1336
+ {
1337
+ "epoch": 0.65,
1338
+ "learning_rate": 0.00035294117647058826,
1339
+ "loss": 1.8078,
1340
+ "step": 110000
1341
+ },
1342
+ {
1343
+ "epoch": 0.65,
1344
+ "learning_rate": 0.00035,
1345
+ "loss": 1.7959,
1346
+ "step": 110500
1347
+ },
1348
+ {
1349
+ "epoch": 0.65,
1350
+ "learning_rate": 0.00034705882352941173,
1351
+ "loss": 1.7853,
1352
+ "step": 111000
1353
+ },
1354
+ {
1355
+ "epoch": 0.66,
1356
+ "learning_rate": 0.0003441176470588235,
1357
+ "loss": 1.8219,
1358
+ "step": 111500
1359
+ },
1360
+ {
1361
+ "epoch": 0.66,
1362
+ "learning_rate": 0.0003411764705882353,
1363
+ "loss": 1.8028,
1364
+ "step": 112000
1365
+ },
1366
+ {
1367
+ "epoch": 0.66,
1368
+ "learning_rate": 0.0003382352941176471,
1369
+ "loss": 1.8283,
1370
+ "step": 112500
1371
+ },
1372
+ {
1373
+ "epoch": 0.66,
1374
+ "learning_rate": 0.0003352941176470588,
1375
+ "loss": 1.7884,
1376
+ "step": 113000
1377
+ },
1378
+ {
1379
+ "epoch": 0.67,
1380
+ "learning_rate": 0.0003323529411764706,
1381
+ "loss": 1.7835,
1382
+ "step": 113500
1383
+ },
1384
+ {
1385
+ "epoch": 0.67,
1386
+ "learning_rate": 0.0003294117647058824,
1387
+ "loss": 1.7767,
1388
+ "step": 114000
1389
+ },
1390
+ {
1391
+ "epoch": 0.67,
1392
+ "learning_rate": 0.0003264705882352941,
1393
+ "loss": 1.8169,
1394
+ "step": 114500
1395
+ },
1396
+ {
1397
+ "epoch": 0.68,
1398
+ "learning_rate": 0.0003235294117647059,
1399
+ "loss": 1.8163,
1400
+ "step": 115000
1401
+ },
1402
+ {
1403
+ "epoch": 0.68,
1404
+ "learning_rate": 0.0003205882352941177,
1405
+ "loss": 1.7781,
1406
+ "step": 115500
1407
+ },
1408
+ {
1409
+ "epoch": 0.68,
1410
+ "learning_rate": 0.0003176470588235294,
1411
+ "loss": 1.7734,
1412
+ "step": 116000
1413
+ },
1414
+ {
1415
+ "epoch": 0.69,
1416
+ "learning_rate": 0.0003147058823529412,
1417
+ "loss": 1.7818,
1418
+ "step": 116500
1419
+ },
1420
+ {
1421
+ "epoch": 0.69,
1422
+ "learning_rate": 0.00031176470588235297,
1423
+ "loss": 1.7562,
1424
+ "step": 117000
1425
+ },
1426
+ {
1427
+ "epoch": 0.69,
1428
+ "learning_rate": 0.0003088235294117647,
1429
+ "loss": 1.7839,
1430
+ "step": 117500
1431
+ },
1432
+ {
1433
+ "epoch": 0.69,
1434
+ "learning_rate": 0.0003058823529411765,
1435
+ "loss": 1.7727,
1436
+ "step": 118000
1437
+ },
1438
+ {
1439
+ "epoch": 0.7,
1440
+ "learning_rate": 0.00030294117647058824,
1441
+ "loss": 1.7838,
1442
+ "step": 118500
1443
+ },
1444
+ {
1445
+ "epoch": 0.7,
1446
+ "learning_rate": 0.0003,
1447
+ "loss": 1.7936,
1448
+ "step": 119000
1449
+ },
1450
+ {
1451
+ "epoch": 0.7,
1452
+ "learning_rate": 0.00029705882352941177,
1453
+ "loss": 1.7998,
1454
+ "step": 119500
1455
+ },
1456
+ {
1457
+ "epoch": 0.71,
1458
+ "learning_rate": 0.00029411764705882356,
1459
+ "loss": 1.773,
1460
+ "step": 120000
1461
+ },
1462
+ {
1463
+ "epoch": 0.71,
1464
+ "learning_rate": 0.0002911764705882353,
1465
+ "loss": 1.7997,
1466
+ "step": 120500
1467
+ },
1468
+ {
1469
+ "epoch": 0.71,
1470
+ "learning_rate": 0.00028823529411764703,
1471
+ "loss": 1.7989,
1472
+ "step": 121000
1473
+ },
1474
+ {
1475
+ "epoch": 0.71,
1476
+ "learning_rate": 0.0002852941176470588,
1477
+ "loss": 1.773,
1478
+ "step": 121500
1479
+ },
1480
+ {
1481
+ "epoch": 0.72,
1482
+ "learning_rate": 0.00028235294117647056,
1483
+ "loss": 1.7351,
1484
+ "step": 122000
1485
+ },
1486
+ {
1487
+ "epoch": 0.72,
1488
+ "learning_rate": 0.00027941176470588236,
1489
+ "loss": 1.7558,
1490
+ "step": 122500
1491
+ },
1492
+ {
1493
+ "epoch": 0.72,
1494
+ "learning_rate": 0.00027647058823529415,
1495
+ "loss": 1.7618,
1496
+ "step": 123000
1497
+ },
1498
+ {
1499
+ "epoch": 0.73,
1500
+ "learning_rate": 0.00027352941176470594,
1501
+ "loss": 1.7601,
1502
+ "step": 123500
1503
+ },
1504
+ {
1505
+ "epoch": 0.73,
1506
+ "learning_rate": 0.0002705882352941176,
1507
+ "loss": 1.8319,
1508
+ "step": 124000
1509
+ },
1510
+ {
1511
+ "epoch": 0.73,
1512
+ "learning_rate": 0.0002676470588235294,
1513
+ "loss": 1.7765,
1514
+ "step": 124500
1515
+ },
1516
+ {
1517
+ "epoch": 0.74,
1518
+ "learning_rate": 0.0002647058823529412,
1519
+ "loss": 1.7838,
1520
+ "step": 125000
1521
+ },
1522
+ {
1523
+ "epoch": 0.74,
1524
+ "learning_rate": 0.00026176470588235295,
1525
+ "loss": 1.7566,
1526
+ "step": 125500
1527
+ },
1528
+ {
1529
+ "epoch": 0.74,
1530
+ "learning_rate": 0.00025882352941176474,
1531
+ "loss": 1.7626,
1532
+ "step": 126000
1533
+ },
1534
+ {
1535
+ "epoch": 0.74,
1536
+ "learning_rate": 0.0002558823529411765,
1537
+ "loss": 1.7678,
1538
+ "step": 126500
1539
+ },
1540
+ {
1541
+ "epoch": 0.75,
1542
+ "learning_rate": 0.0002529411764705882,
1543
+ "loss": 1.794,
1544
+ "step": 127000
1545
+ },
1546
+ {
1547
+ "epoch": 0.75,
1548
+ "learning_rate": 0.00025,
1549
+ "loss": 1.747,
1550
+ "step": 127500
1551
+ },
1552
+ {
1553
+ "epoch": 0.75,
1554
+ "learning_rate": 0.0002470588235294118,
1555
+ "loss": 1.7565,
1556
+ "step": 128000
1557
+ },
1558
+ {
1559
+ "epoch": 0.76,
1560
+ "learning_rate": 0.00024411764705882354,
1561
+ "loss": 1.7932,
1562
+ "step": 128500
1563
+ },
1564
+ {
1565
+ "epoch": 0.76,
1566
+ "learning_rate": 0.0002411764705882353,
1567
+ "loss": 1.7623,
1568
+ "step": 129000
1569
+ },
1570
+ {
1571
+ "epoch": 0.76,
1572
+ "learning_rate": 0.00023823529411764704,
1573
+ "loss": 1.7712,
1574
+ "step": 129500
1575
+ },
1576
+ {
1577
+ "epoch": 0.76,
1578
+ "learning_rate": 0.00023529411764705883,
1579
+ "loss": 1.7684,
1580
+ "step": 130000
1581
+ },
1582
+ {
1583
+ "epoch": 0.77,
1584
+ "learning_rate": 0.0002323529411764706,
1585
+ "loss": 1.7317,
1586
+ "step": 130500
1587
+ },
1588
+ {
1589
+ "epoch": 0.77,
1590
+ "learning_rate": 0.00022941176470588236,
1591
+ "loss": 1.7679,
1592
+ "step": 131000
1593
+ },
1594
+ {
1595
+ "epoch": 0.77,
1596
+ "learning_rate": 0.00022647058823529412,
1597
+ "loss": 1.7735,
1598
+ "step": 131500
1599
+ },
1600
+ {
1601
+ "epoch": 0.78,
1602
+ "learning_rate": 0.0002235294117647059,
1603
+ "loss": 1.7688,
1604
+ "step": 132000
1605
+ },
1606
+ {
1607
+ "epoch": 0.78,
1608
+ "learning_rate": 0.00022058823529411765,
1609
+ "loss": 1.7494,
1610
+ "step": 132500
1611
+ },
1612
+ {
1613
+ "epoch": 0.78,
1614
+ "learning_rate": 0.00021764705882352942,
1615
+ "loss": 1.7739,
1616
+ "step": 133000
1617
+ },
1618
+ {
1619
+ "epoch": 0.79,
1620
+ "learning_rate": 0.00021470588235294116,
1621
+ "loss": 1.7537,
1622
+ "step": 133500
1623
+ },
1624
+ {
1625
+ "epoch": 0.79,
1626
+ "learning_rate": 0.00021176470588235295,
1627
+ "loss": 1.7243,
1628
+ "step": 134000
1629
+ },
1630
+ {
1631
+ "epoch": 0.79,
1632
+ "learning_rate": 0.00020882352941176471,
1633
+ "loss": 1.7828,
1634
+ "step": 134500
1635
+ },
1636
+ {
1637
+ "epoch": 0.79,
1638
+ "learning_rate": 0.00020588235294117645,
1639
+ "loss": 1.783,
1640
+ "step": 135000
1641
+ },
1642
+ {
1643
+ "epoch": 0.8,
1644
+ "learning_rate": 0.00020294117647058824,
1645
+ "loss": 1.714,
1646
+ "step": 135500
1647
+ },
1648
+ {
1649
+ "epoch": 0.8,
1650
+ "learning_rate": 0.0002,
1651
+ "loss": 1.7523,
1652
+ "step": 136000
1653
+ },
1654
+ {
1655
+ "epoch": 0.8,
1656
+ "learning_rate": 0.00019705882352941177,
1657
+ "loss": 1.7409,
1658
+ "step": 136500
1659
+ },
1660
+ {
1661
+ "epoch": 0.81,
1662
+ "learning_rate": 0.00019411764705882354,
1663
+ "loss": 1.7641,
1664
+ "step": 137000
1665
+ },
1666
+ {
1667
+ "epoch": 0.81,
1668
+ "learning_rate": 0.00019117647058823528,
1669
+ "loss": 1.7501,
1670
+ "step": 137500
1671
+ },
1672
+ {
1673
+ "epoch": 0.81,
1674
+ "learning_rate": 0.00018823529411764707,
1675
+ "loss": 1.7367,
1676
+ "step": 138000
1677
+ },
1678
+ {
1679
+ "epoch": 0.81,
1680
+ "learning_rate": 0.00018529411764705883,
1681
+ "loss": 1.7362,
1682
+ "step": 138500
1683
+ },
1684
+ {
1685
+ "epoch": 0.82,
1686
+ "learning_rate": 0.00018235294117647057,
1687
+ "loss": 1.7564,
1688
+ "step": 139000
1689
+ },
1690
+ {
1691
+ "epoch": 0.82,
1692
+ "learning_rate": 0.00017941176470588236,
1693
+ "loss": 1.7527,
1694
+ "step": 139500
1695
+ },
1696
+ {
1697
+ "epoch": 0.82,
1698
+ "learning_rate": 0.00017647058823529413,
1699
+ "loss": 1.7404,
1700
+ "step": 140000
1701
+ },
1702
+ {
1703
+ "epoch": 0.83,
1704
+ "learning_rate": 0.00017352941176470587,
1705
+ "loss": 1.7694,
1706
+ "step": 140500
1707
+ },
1708
+ {
1709
+ "epoch": 0.83,
1710
+ "learning_rate": 0.00017058823529411766,
1711
+ "loss": 1.7443,
1712
+ "step": 141000
1713
+ },
1714
+ {
1715
+ "epoch": 0.83,
1716
+ "learning_rate": 0.0001676470588235294,
1717
+ "loss": 1.7413,
1718
+ "step": 141500
1719
+ },
1720
+ {
1721
+ "epoch": 0.84,
1722
+ "learning_rate": 0.0001647058823529412,
1723
+ "loss": 1.7572,
1724
+ "step": 142000
1725
+ },
1726
+ {
1727
+ "epoch": 0.84,
1728
+ "learning_rate": 0.00016176470588235295,
1729
+ "loss": 1.7442,
1730
+ "step": 142500
1731
+ },
1732
+ {
1733
+ "epoch": 0.84,
1734
+ "learning_rate": 0.0001588235294117647,
1735
+ "loss": 1.7608,
1736
+ "step": 143000
1737
+ },
1738
+ {
1739
+ "epoch": 0.84,
1740
+ "learning_rate": 0.00015588235294117648,
1741
+ "loss": 1.7467,
1742
+ "step": 143500
1743
+ },
1744
+ {
1745
+ "epoch": 0.85,
1746
+ "learning_rate": 0.00015294117647058825,
1747
+ "loss": 1.7496,
1748
+ "step": 144000
1749
+ },
1750
+ {
1751
+ "epoch": 0.85,
1752
+ "learning_rate": 0.00015,
1753
+ "loss": 1.7396,
1754
+ "step": 144500
1755
+ },
1756
+ {
1757
+ "epoch": 0.85,
1758
+ "learning_rate": 0.00014705882352941178,
1759
+ "loss": 1.7341,
1760
+ "step": 145000
1761
+ },
1762
+ {
1763
+ "epoch": 0.86,
1764
+ "learning_rate": 0.00014411764705882352,
1765
+ "loss": 1.7358,
1766
+ "step": 145500
1767
+ },
1768
+ {
1769
+ "epoch": 0.86,
1770
+ "learning_rate": 0.00014117647058823528,
1771
+ "loss": 1.7721,
1772
+ "step": 146000
1773
+ },
1774
+ {
1775
+ "epoch": 0.86,
1776
+ "learning_rate": 0.00013823529411764707,
1777
+ "loss": 1.7224,
1778
+ "step": 146500
1779
+ },
1780
+ {
1781
+ "epoch": 0.86,
1782
+ "learning_rate": 0.0001352941176470588,
1783
+ "loss": 1.7137,
1784
+ "step": 147000
1785
+ },
1786
+ {
1787
+ "epoch": 0.87,
1788
+ "learning_rate": 0.0001323529411764706,
1789
+ "loss": 1.7555,
1790
+ "step": 147500
1791
+ },
1792
+ {
1793
+ "epoch": 0.87,
1794
+ "learning_rate": 0.00012941176470588237,
1795
+ "loss": 1.7432,
1796
+ "step": 148000
1797
+ },
1798
+ {
1799
+ "epoch": 0.87,
1800
+ "learning_rate": 0.0001264705882352941,
1801
+ "loss": 1.7319,
1802
+ "step": 148500
1803
+ },
1804
+ {
1805
+ "epoch": 0.88,
1806
+ "learning_rate": 0.0001235294117647059,
1807
+ "loss": 1.725,
1808
+ "step": 149000
1809
+ },
1810
+ {
1811
+ "epoch": 0.88,
1812
+ "learning_rate": 0.00012058823529411765,
1813
+ "loss": 1.7655,
1814
+ "step": 149500
1815
+ },
1816
+ {
1817
+ "epoch": 0.88,
1818
+ "learning_rate": 0.00011764705882352942,
1819
+ "loss": 1.7358,
1820
+ "step": 150000
1821
+ },
1822
+ {
1823
+ "epoch": 0.89,
1824
+ "learning_rate": 0.00011470588235294118,
1825
+ "loss": 1.7214,
1826
+ "step": 150500
1827
+ },
1828
+ {
1829
+ "epoch": 0.89,
1830
+ "learning_rate": 0.00011176470588235294,
1831
+ "loss": 1.7413,
1832
+ "step": 151000
1833
+ },
1834
+ {
1835
+ "epoch": 0.89,
1836
+ "learning_rate": 0.00010882352941176471,
1837
+ "loss": 1.7069,
1838
+ "step": 151500
1839
+ },
1840
+ {
1841
+ "epoch": 0.89,
1842
+ "learning_rate": 0.00010588235294117647,
1843
+ "loss": 1.7282,
1844
+ "step": 152000
1845
+ },
1846
+ {
1847
+ "epoch": 0.9,
1848
+ "learning_rate": 0.00010294117647058823,
1849
+ "loss": 1.7121,
1850
+ "step": 152500
1851
+ },
1852
+ {
1853
+ "epoch": 0.9,
1854
+ "learning_rate": 0.0001,
1855
+ "loss": 1.7227,
1856
+ "step": 153000
1857
+ },
1858
+ {
1859
+ "epoch": 0.9,
1860
+ "learning_rate": 9.705882352941177e-05,
1861
+ "loss": 1.7391,
1862
+ "step": 153500
1863
+ },
1864
+ {
1865
+ "epoch": 0.91,
1866
+ "learning_rate": 9.411764705882353e-05,
1867
+ "loss": 1.7422,
1868
+ "step": 154000
1869
+ },
1870
+ {
1871
+ "epoch": 0.91,
1872
+ "learning_rate": 9.117647058823529e-05,
1873
+ "loss": 1.7154,
1874
+ "step": 154500
1875
+ },
1876
+ {
1877
+ "epoch": 0.91,
1878
+ "learning_rate": 8.823529411764706e-05,
1879
+ "loss": 1.7419,
1880
+ "step": 155000
1881
+ },
1882
+ {
1883
+ "epoch": 0.91,
1884
+ "learning_rate": 8.529411764705883e-05,
1885
+ "loss": 1.7608,
1886
+ "step": 155500
1887
+ },
1888
+ {
1889
+ "epoch": 0.92,
1890
+ "learning_rate": 8.23529411764706e-05,
1891
+ "loss": 1.7154,
1892
+ "step": 156000
1893
+ },
1894
+ {
1895
+ "epoch": 0.92,
1896
+ "learning_rate": 7.941176470588235e-05,
1897
+ "loss": 1.7082,
1898
+ "step": 156500
1899
+ },
1900
+ {
1901
+ "epoch": 0.92,
1902
+ "learning_rate": 7.647058823529412e-05,
1903
+ "loss": 1.7347,
1904
+ "step": 157000
1905
+ },
1906
+ {
1907
+ "epoch": 0.93,
1908
+ "learning_rate": 7.352941176470589e-05,
1909
+ "loss": 1.7054,
1910
+ "step": 157500
1911
+ },
1912
+ {
1913
+ "epoch": 0.93,
1914
+ "learning_rate": 7.058823529411764e-05,
1915
+ "loss": 1.7326,
1916
+ "step": 158000
1917
+ },
1918
+ {
1919
+ "epoch": 0.93,
1920
+ "learning_rate": 6.76470588235294e-05,
1921
+ "loss": 1.7224,
1922
+ "step": 158500
1923
+ },
1924
+ {
1925
+ "epoch": 0.94,
1926
+ "learning_rate": 6.470588235294118e-05,
1927
+ "loss": 1.7362,
1928
+ "step": 159000
1929
+ },
1930
+ {
1931
+ "epoch": 0.94,
1932
+ "learning_rate": 6.176470588235295e-05,
1933
+ "loss": 1.7159,
1934
+ "step": 159500
1935
+ },
1936
+ {
1937
+ "epoch": 0.94,
1938
+ "learning_rate": 5.882352941176471e-05,
1939
+ "loss": 1.73,
1940
+ "step": 160000
1941
+ },
1942
+ {
1943
+ "epoch": 0.94,
1944
+ "learning_rate": 5.588235294117647e-05,
1945
+ "loss": 1.7523,
1946
+ "step": 160500
1947
+ },
1948
+ {
1949
+ "epoch": 0.95,
1950
+ "learning_rate": 5.294117647058824e-05,
1951
+ "loss": 1.7315,
1952
+ "step": 161000
1953
+ },
1954
+ {
1955
+ "epoch": 0.95,
1956
+ "learning_rate": 5e-05,
1957
+ "loss": 1.709,
1958
+ "step": 161500
1959
+ },
1960
+ {
1961
+ "epoch": 0.95,
1962
+ "learning_rate": 4.705882352941177e-05,
1963
+ "loss": 1.7341,
1964
+ "step": 162000
1965
+ },
1966
+ {
1967
+ "epoch": 0.96,
1968
+ "learning_rate": 4.411764705882353e-05,
1969
+ "loss": 1.7186,
1970
+ "step": 162500
1971
+ },
1972
+ {
1973
+ "epoch": 0.96,
1974
+ "learning_rate": 4.11764705882353e-05,
1975
+ "loss": 1.719,
1976
+ "step": 163000
1977
+ },
1978
+ {
1979
+ "epoch": 0.96,
1980
+ "learning_rate": 3.823529411764706e-05,
1981
+ "loss": 1.7115,
1982
+ "step": 163500
1983
+ },
1984
+ {
1985
+ "epoch": 0.96,
1986
+ "learning_rate": 3.529411764705882e-05,
1987
+ "loss": 1.7036,
1988
+ "step": 164000
1989
+ },
1990
+ {
1991
+ "epoch": 0.97,
1992
+ "learning_rate": 3.235294117647059e-05,
1993
+ "loss": 1.7147,
1994
+ "step": 164500
1995
+ },
1996
+ {
1997
+ "epoch": 0.97,
1998
+ "learning_rate": 2.9411764705882354e-05,
1999
+ "loss": 1.7347,
2000
+ "step": 165000
2001
+ },
2002
+ {
2003
+ "epoch": 0.97,
2004
+ "learning_rate": 2.647058823529412e-05,
2005
+ "loss": 1.7334,
2006
+ "step": 165500
2007
+ },
2008
+ {
2009
+ "epoch": 0.98,
2010
+ "learning_rate": 2.3529411764705884e-05,
2011
+ "loss": 1.7337,
2012
+ "step": 166000
2013
+ },
2014
+ {
2015
+ "epoch": 0.98,
2016
+ "learning_rate": 2.058823529411765e-05,
2017
+ "loss": 1.7266,
2018
+ "step": 166500
2019
+ },
2020
+ {
2021
+ "epoch": 0.98,
2022
+ "learning_rate": 1.764705882352941e-05,
2023
+ "loss": 1.6958,
2024
+ "step": 167000
2025
+ },
2026
+ {
2027
+ "epoch": 0.99,
2028
+ "learning_rate": 1.4705882352941177e-05,
2029
+ "loss": 1.7052,
2030
+ "step": 167500
2031
+ },
2032
+ {
2033
+ "epoch": 0.99,
2034
+ "learning_rate": 1.1764705882352942e-05,
2035
+ "loss": 1.6985,
2036
+ "step": 168000
2037
+ },
2038
+ {
2039
+ "epoch": 0.99,
2040
+ "learning_rate": 8.823529411764705e-06,
2041
+ "loss": 1.6896,
2042
+ "step": 168500
2043
+ },
2044
+ {
2045
+ "epoch": 0.99,
2046
+ "learning_rate": 5.882352941176471e-06,
2047
+ "loss": 1.7455,
2048
+ "step": 169000
2049
+ },
2050
+ {
2051
+ "epoch": 1.0,
2052
+ "learning_rate": 2.9411764705882355e-06,
2053
+ "loss": 1.7554,
2054
+ "step": 169500
2055
+ },
2056
+ {
2057
+ "epoch": 1.0,
2058
+ "learning_rate": 0.0,
2059
+ "loss": 1.6962,
2060
+ "step": 170000
2061
+ },
2062
+ {
2063
+ "epoch": 1.0,
2064
+ "step": 170000,
2065
+ "total_flos": 3421528306437104640,
2066
+ "train_runtime": 228303.6105,
2067
+ "train_samples_per_second": 0.745
2068
+ }
2069
+ ],
2070
+ "max_steps": 170000,
2071
+ "num_train_epochs": 1,
2072
+ "total_flos": 3421528306437104640,
2073
+ "trial_name": null,
2074
+ "trial_params": null
2075
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2aea9ccbcaaea512b3a6a87a639701e96496f93731782c408a105e3c849e785
3
+ size 2479
val_results.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "val_gen_len": 86.9,
4
+ "val_loss": 1.5501,
5
+ "val_n_objs": 100,
6
+ "val_rouge1": 28.9281,
7
+ "val_rouge2": 9.8583,
8
+ "val_rougeL": 20.7577,
9
+ "val_rougeLsum": 24.4792,
10
+ "val_runtime": 108.2772,
11
+ "val_samples_per_second": 0.924
12
+ }