declare-lab-sutd commited on
Commit
fabdab4
·
1 Parent(s): 98bbf60

Upload DIALECT

Browse files
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+
4
+ widget:
5
+ - text: "What is or could be the cause of target? <sep> target: Thanks. Will I be able to take a retest ? <sep> context: A: Did I do well on my test ?, <utt> B: Do you want to know the honest answer ?, <utt> A: Why wouldn't I want to know ?, <utt> B: You had pretty bad scores ., <utt> A: Exactly what do you mean by bad ?, <utt> B: You failed ., <utt> A: How'd I fail it ?, <utt> B: There are a couple of reasons why you didn't pass ., <utt> A: What did I do wrong ?, <utt> B: To sum it all up , you really just don't know how to drive ., <utt> A: Thanks. Will I be able to take a retest ?, <utt> B: Sure you can , in about two and a half weeks . "
6
+ example_title: "Cause 1"
7
+ - text: "What is or could be the cause of target? <sep> target: But she did and made me disappointed . <sep> context: A: David , why didn't you clean the room ?, <utt> B: I'm not in the mood ., <utt> A: Why are you feeling depressed ?, <utt> B: I was told my girlfriend was speaking ill of me. That's a real let-down ., <utt> A: I don t think she will do such a thing ., <utt> B: But she did and made me disappointed ., <utt> A: Oh , cheer up . A girlfriend is not everything ., <utt> B: But she means a lot to me ., <utt> A: Then forgive her mistake ., <utt> B: Oh . I just can't forget it "
8
+ example_title: "Cause 2"
9
+ - text: "What subsequent event happens or could happen following the target? <sep> target: Oh . I just can't forget it .<sep> context: A: David , why didn't you clean the room ?, <utt> B: I'm not in the mood ., <utt> A: Why are you feeling depressed ?, <utt> B: I was told my girlfriend was speaking ill of me. That \u2019 s a real let-down ., <utt> A: I don t think she will do such a thing ., <utt> B: But she did and made me disappointed ., <utt> A: Oh , cheer up . A girlfriend is not everything ., <utt> B: But she means a lot to me ., <utt> A: Then forgive her mistake ., <utt> B: Oh . I just can't forget it "
10
+ example_title: "Subsequent Event 1"
11
+ - text: "What subsequent event happens or could happen following the target? <sep> target: Sure you can , in about two and a half weeks . <sep> context: A: Did I do well on my test ?, <utt> B: Do you want to know the honest answer ?, <utt> A: Why wouldn't I want to know ?, <utt> B: You had pretty bad scores ., <utt> A: Exactly what do you mean by bad ?, <utt> B: You failed ., <utt> A: How'd I fail it ?, <utt> B: There are a couple of reasons why you didn't pass ., <utt> A: What did I do wrong ?, <utt> B: To sum it all up , you really just don't know how to drive ., <utt> A: Thanks. Will I be able to take a retest ?, <utt> B: Sure you can , in about two and a half weeks . "
12
+ example_title: "Subsequent Event 2"
13
+ - text: "What is the possible emotional reaction of the listener in response to target? <sep> target: Oh . I just can't forget it .<sep> context: A: David , why didn't you clean the room ?, <utt> B: I'm not in the mood ., <utt> A: Why are you feeling depressed ?, <utt> B: I was told my girlfriend was speaking ill of me. That \u2019 s a real let-down ., <utt> A: I don t think she will do such a thing ., <utt> B: But she did and made me disappointed ., <utt> A: Oh , cheer up . A girlfriend is not everything ., <utt> B: But she means a lot to me ., <utt> A: Then forgive her mistake ., <utt> B: Oh . I just can't forget it "
14
+ example_title: "Emotional Reaction"
15
+ - text: "What is or could be the motivation of target? <sep> target: Sure you can , in about two and a half weeks . <sep> context: A: Did I do well on my test ?, <utt> B: Do you want to know the honest answer ?, <utt> A: Why wouldn't I want to know ?, <utt> B: You had pretty bad scores ., <utt> A: Exactly what do you mean by bad ?, <utt> B: You failed ., <utt> A: How'd I fail it ?, <utt> B: There are a couple of reasons why you didn't pass ., <utt> A: What did I do wrong ?, <utt> B: To sum it all up , you really just don't know how to drive ., <utt> A: Thanks. Will I be able to take a retest ?, <utt> B: Sure you can , in about two and a half weeks . "
16
+ example_title: "Motivation"
17
+ ---
18
+
19
+ ## DIALogue-level Commonsense Transformer (DIALeCT)
20
+ The pretrained checkpoint for the paper [Multiview Contextual Commonsense Inference: A New Dataset and Task](https://arxiv.org/abs/2210.02890).
21
+
22
+ The model is trained based on the [T5-large](https://huggingface.co/t5-large) checkpoint.
23
+
24
+ ![model image](https://drive.google.com/uc?export=download&id=14RIbxgXhREdu5xZiKn5D-UUzaQLDNLqf)
25
+
26
+
27
+ ## Datasets
28
+ The dataset used to pretrain the model can be obtained from the [CICERO repo](https://github.com/declare-lab/CICERO) following instructions. The Contextualized Commonsense Inference in Dialogues v2 (CICEROv2) consists of annotated commonsense inferences including cause and emotional reaction, etc. The dialogues are from multiple datasets.
29
+ | Dataset | #Dialogues| #Instances|
30
+ | -------- | ----- | --------- |
31
+ | DailyDialog| 1118| 3973|
32
+ | MuTual| 1011 | 3384|
33
+ | Dream| 250 | 994|
34
+
35
+ ### Examples
36
+ Some examples of generated results from the pretrained model (the zero-shot setting).
37
+
38
+ **Subsequent Event**
39
+ ```
40
+ What is or could be the subsequent event of the target? <sep>
41
+ target: Oh . I just can't forget it .<sep>
42
+ context: A: David , why didn't you clean the room ?, <utt>
43
+ B: I'm not in the mood ., <utt>
44
+ A: Why are you feeling depressed ?, <utt>
45
+ B: I was told my girlfriend was speaking ill of me. That \u2019 s a real let-down ., <utt>
46
+ A: I don t think she will do such a thing ., <utt>
47
+ B: But she did and made me disappointed ., <utt>
48
+ A: Oh , cheer up . A girlfriend is not everything ., <utt>
49
+ B: But she means a lot to me ., <utt>
50
+ A: Then forgive her mistake ., <utt>
51
+ B: Oh . I just can't forget it
52
+ ```
53
+ Predicted subsequent event:
54
+ ```
55
+ David's girlfriend apologized to david for her mistake.
56
+ ```
57
+
58
+ **Cause**
59
+ ```
60
+ What is or could be the cause of target? <sep>
61
+ target: Thanks. Will I be able to take a retest ? <sep>
62
+ context: A: Did I do well on my test ?, <utt>
63
+ B: Do you want to know the honest answer ?, <utt>
64
+ A: Why wouldn't I want to know ?, <utt>
65
+ B: You had pretty bad scores ., <utt>
66
+ A: Exactly what do you mean by bad ?, <utt>
67
+ B: You failed ., <utt>
68
+ A: How'd I fail it ?, <utt>
69
+ B: There are a couple of reasons why you didn't pass ., <utt>
70
+ A: What did I do wrong ?, <utt>
71
+ B: To sum it all up , you really just don't know how to drive ., <utt>
72
+ A: Thanks. Will I be able to take a retest ?, <utt>
73
+ B: Sure you can , in about two and a half weeks .
74
+ ```
75
+ Predicted cause:
76
+ ```
77
+ The speaker has failed the driving test.
78
+ ```
79
+
80
+ **Emotional Reaction**
81
+ ```
82
+ What is the possible emotional reaction of the listener in response to target? <sep>
83
+ target: Oh . I just can't forget it .<sep>
84
+ context: A: David , why didn't you clean the room ?, <utt>
85
+ B: I'm not in the mood ., <utt>
86
+ A: Why are you feeling depressed ?, <utt>
87
+ B: I was told my girlfriend was speaking ill of me. That \u2019 s a real let-down ., <utt>
88
+ A: I don t think she will do such a thing ., <utt>
89
+ B: But she did and made me disappointed ., <utt>
90
+ A: Oh , cheer up . A girlfriend is not everything ., <utt>
91
+ B: But she means a lot to me ., <utt>
92
+ A: Then forgive her mistake ., <utt>
93
+ B: Oh . I just can't forget it
94
+ ```
95
+ Predicted emotional reaction:
96
+ ```
97
+ The listener is hopeful that david will forgive his girlfriend for her mistake.
98
+ ```
99
+
100
+ ## Inference:
101
+ The input text should be formatted as follows:
102
+
103
+ ```
104
+ Question <sep> target: target_utt <sep> context: A: utterance 1 <utt> B: utterance 2 <utt> A: utterance 3 <utt> B: utterance 4
105
+ ```
106
+ Question: The question against which we want to make the inference.
107
+
108
+ A, B are speaker identifiers
109
+
110
+ The ```target_utt``` should be anyone between ```utterance 1, utterance 2, utterance 3, or utterance 4```. Do not use the speaker identifier in the ```target_utt```
111
+
112
+ Some samples are provided in the Hosted inference API box examples.
113
+
114
+ ## BibTeX entry and citation info
115
+ If you use the model, you can cite:
116
+ ```bibtex
117
+ @article{Shen2022MultiviewCC,
118
+ title={Multiview Contextual Commonsense Inference: A New Dataset and Task},
119
+ author={Siqi Shen and Deepanway Ghosal and Navonil Majumder and Henry Lim and Rada Mihalcea and Soujanya Poria},
120
+ journal={ArXiv},
121
+ year={2022},
122
+ volume={abs/2210.02890}
123
+ }
124
+ ```
config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "t5-large",
3
+ "architectures": [
4
+ "T5ForConditionalGeneration"
5
+ ],
6
+ "d_ff": 4096,
7
+ "d_kv": 64,
8
+ "d_model": 1024,
9
+ "decoder_start_token_id": 0,
10
+ "dropout_rate": 0.1,
11
+ "eos_token_id": 1,
12
+ "feed_forward_proj": "relu",
13
+ "initializer_factor": 1.0,
14
+ "is_encoder_decoder": true,
15
+ "layer_norm_epsilon": 1e-06,
16
+ "model_type": "t5",
17
+ "n_positions": 512,
18
+ "num_decoder_layers": 24,
19
+ "num_heads": 16,
20
+ "num_layers": 24,
21
+ "output_past": true,
22
+ "pad_token_id": 0,
23
+ "relative_attention_num_buckets": 32,
24
+ "task_specific_params": {
25
+ "summarization": {
26
+ "early_stopping": true,
27
+ "length_penalty": 2.0,
28
+ "max_length": 200,
29
+ "min_length": 30,
30
+ "no_repeat_ngram_size": 3,
31
+ "num_beams": 4,
32
+ "prefix": "summarize: "
33
+ },
34
+ "translation_en_to_de": {
35
+ "early_stopping": true,
36
+ "max_length": 300,
37
+ "num_beams": 4,
38
+ "prefix": "translate English to German: "
39
+ },
40
+ "translation_en_to_fr": {
41
+ "early_stopping": true,
42
+ "max_length": 300,
43
+ "num_beams": 4,
44
+ "prefix": "translate English to French: "
45
+ },
46
+ "translation_en_to_ro": {
47
+ "early_stopping": true,
48
+ "max_length": 300,
49
+ "num_beams": 4,
50
+ "prefix": "translate English to Romanian: "
51
+ }
52
+ },
53
+ "torch_dtype": "float32",
54
+ "transformers_version": "4.17.0",
55
+ "use_cache": true,
56
+ "vocab_size": 32100
57
+ }
gitattributes.txt ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ftz filter=lfs diff=lfs merge=lfs -text
6
+ *.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.h5 filter=lfs diff=lfs merge=lfs -text
8
+ *.joblib filter=lfs diff=lfs merge=lfs -text
9
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
10
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
11
+ *.model filter=lfs diff=lfs merge=lfs -text
12
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
13
+ *.npy filter=lfs diff=lfs merge=lfs -text
14
+ *.npz filter=lfs diff=lfs merge=lfs -text
15
+ *.onnx filter=lfs diff=lfs merge=lfs -text
16
+ *.ot filter=lfs diff=lfs merge=lfs -text
17
+ *.parquet filter=lfs diff=lfs merge=lfs -text
18
+ *.pb filter=lfs diff=lfs merge=lfs -text
19
+ *.pickle filter=lfs diff=lfs merge=lfs -text
20
+ *.pkl filter=lfs diff=lfs merge=lfs -text
21
+ *.pt filter=lfs diff=lfs merge=lfs -text
22
+ *.pth filter=lfs diff=lfs merge=lfs -text
23
+ *.rar filter=lfs diff=lfs merge=lfs -text
24
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
25
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
26
+ *.tflite filter=lfs diff=lfs merge=lfs -text
27
+ *.tgz filter=lfs diff=lfs merge=lfs -text
28
+ *.wasm filter=lfs diff=lfs merge=lfs -text
29
+ *.xz filter=lfs diff=lfs merge=lfs -text
30
+ *.zip filter=lfs diff=lfs merge=lfs -text
31
+ *.zst filter=lfs diff=lfs merge=lfs -text
32
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4af74482a53cd21b759e7be6d64a671dcf01119f7588761aee10e096a67ecdd
3
+ size 5353884
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3d3dfc3ef553602757800b7a71368f7a98483c17a2bf5757ab36319021d1847a
3
+ size 2950790023
rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0426f98a6f3e83788a3c20a56b703e690d1b9f704c348a8b5e4de89590384588
3
+ size 14503
rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3459dbb8f9c8943d47220d26307cc4ef700fd12b55975294e783c533c39b0469
3
+ size 14503
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9143d40b5b381261adcb12d94cf28327a88dea56b2c4935acd8ce43b9edc08c
3
+ size 623
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "additional_special_tokens": ["<extra_id_0>", "<extra_id_1>", "<extra_id_2>", "<extra_id_3>", "<extra_id_4>", "<extra_id_5>", "<extra_id_6>", "<extra_id_7>", "<extra_id_8>", "<extra_id_9>", "<extra_id_10>", "<extra_id_11>", "<extra_id_12>", "<extra_id_13>", "<extra_id_14>", "<extra_id_15>", "<extra_id_16>", "<extra_id_17>", "<extra_id_18>", "<extra_id_19>", "<extra_id_20>", "<extra_id_21>", "<extra_id_22>", "<extra_id_23>", "<extra_id_24>", "<extra_id_25>", "<extra_id_26>", "<extra_id_27>", "<extra_id_28>", "<extra_id_29>", "<extra_id_30>", "<extra_id_31>", "<extra_id_32>", "<extra_id_33>", "<extra_id_34>", "<extra_id_35>", "<extra_id_36>", "<extra_id_37>", "<extra_id_38>", "<extra_id_39>", "<extra_id_40>", "<extra_id_41>", "<extra_id_42>", "<extra_id_43>", "<extra_id_44>", "<extra_id_45>", "<extra_id_46>", "<extra_id_47>", "<extra_id_48>", "<extra_id_49>", "<extra_id_50>", "<extra_id_51>", "<extra_id_52>", "<extra_id_53>", "<extra_id_54>", "<extra_id_55>", "<extra_id_56>", "<extra_id_57>", "<extra_id_58>", "<extra_id_59>", "<extra_id_60>", "<extra_id_61>", "<extra_id_62>", "<extra_id_63>", "<extra_id_64>", "<extra_id_65>", "<extra_id_66>", "<extra_id_67>", "<extra_id_68>", "<extra_id_69>", "<extra_id_70>", "<extra_id_71>", "<extra_id_72>", "<extra_id_73>", "<extra_id_74>", "<extra_id_75>", "<extra_id_76>", "<extra_id_77>", "<extra_id_78>", "<extra_id_79>", "<extra_id_80>", "<extra_id_81>", "<extra_id_82>", "<extra_id_83>", "<extra_id_84>", "<extra_id_85>", "<extra_id_86>", "<extra_id_87>", "<extra_id_88>", "<extra_id_89>", "<extra_id_90>", "<extra_id_91>", "<extra_id_92>", "<extra_id_93>", "<extra_id_94>", "<extra_id_95>", "<extra_id_96>", "<extra_id_97>", "<extra_id_98>", "<extra_id_99>"]}
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
+ size 791656
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "extra_ids": 100, "additional_special_tokens": ["<extra_id_0>", "<extra_id_1>", "<extra_id_2>", "<extra_id_3>", "<extra_id_4>", "<extra_id_5>", "<extra_id_6>", "<extra_id_7>", "<extra_id_8>", "<extra_id_9>", "<extra_id_10>", "<extra_id_11>", "<extra_id_12>", "<extra_id_13>", "<extra_id_14>", "<extra_id_15>", "<extra_id_16>", "<extra_id_17>", "<extra_id_18>", "<extra_id_19>", "<extra_id_20>", "<extra_id_21>", "<extra_id_22>", "<extra_id_23>", "<extra_id_24>", "<extra_id_25>", "<extra_id_26>", "<extra_id_27>", "<extra_id_28>", "<extra_id_29>", "<extra_id_30>", "<extra_id_31>", "<extra_id_32>", "<extra_id_33>", "<extra_id_34>", "<extra_id_35>", "<extra_id_36>", "<extra_id_37>", "<extra_id_38>", "<extra_id_39>", "<extra_id_40>", "<extra_id_41>", "<extra_id_42>", "<extra_id_43>", "<extra_id_44>", "<extra_id_45>", "<extra_id_46>", "<extra_id_47>", "<extra_id_48>", "<extra_id_49>", "<extra_id_50>", "<extra_id_51>", "<extra_id_52>", "<extra_id_53>", "<extra_id_54>", "<extra_id_55>", "<extra_id_56>", "<extra_id_57>", "<extra_id_58>", "<extra_id_59>", "<extra_id_60>", "<extra_id_61>", "<extra_id_62>", "<extra_id_63>", "<extra_id_64>", "<extra_id_65>", "<extra_id_66>", "<extra_id_67>", "<extra_id_68>", "<extra_id_69>", "<extra_id_70>", "<extra_id_71>", "<extra_id_72>", "<extra_id_73>", "<extra_id_74>", "<extra_id_75>", "<extra_id_76>", "<extra_id_77>", "<extra_id_78>", "<extra_id_79>", "<extra_id_80>", "<extra_id_81>", "<extra_id_82>", "<extra_id_83>", "<extra_id_84>", "<extra_id_85>", "<extra_id_86>", "<extra_id_87>", "<extra_id_88>", "<extra_id_89>", "<extra_id_90>", "<extra_id_91>", "<extra_id_92>", "<extra_id_93>", "<extra_id_94>", "<extra_id_95>", "<extra_id_96>", "<extra_id_97>", "<extra_id_98>", "<extra_id_99>"], "model_max_length": 512, "special_tokens_map_file": null, "name_or_path": "t5-large", "tokenizer_class": "T5Tokenizer"}
trainer_state.json ADDED
@@ -0,0 +1,1036 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.8414472437961735,
5
+ "global_step": 75000,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.02,
12
+ "learning_rate": 9.936856727915642e-06,
13
+ "loss": 1.4639,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.04,
18
+ "learning_rate": 9.873713455831283e-06,
19
+ "loss": 1.1219,
20
+ "step": 1000
21
+ },
22
+ {
23
+ "epoch": 0.06,
24
+ "learning_rate": 9.810570183746922e-06,
25
+ "loss": 1.0672,
26
+ "step": 1500
27
+ },
28
+ {
29
+ "epoch": 0.08,
30
+ "learning_rate": 9.747426911662563e-06,
31
+ "loss": 0.9959,
32
+ "step": 2000
33
+ },
34
+ {
35
+ "epoch": 0.09,
36
+ "learning_rate": 9.684283639578204e-06,
37
+ "loss": 0.9887,
38
+ "step": 2500
39
+ },
40
+ {
41
+ "epoch": 0.11,
42
+ "learning_rate": 9.621140367493844e-06,
43
+ "loss": 0.9706,
44
+ "step": 3000
45
+ },
46
+ {
47
+ "epoch": 0.13,
48
+ "learning_rate": 9.557997095409485e-06,
49
+ "loss": 0.9167,
50
+ "step": 3500
51
+ },
52
+ {
53
+ "epoch": 0.15,
54
+ "learning_rate": 9.494853823325125e-06,
55
+ "loss": 0.9241,
56
+ "step": 4000
57
+ },
58
+ {
59
+ "epoch": 0.17,
60
+ "learning_rate": 9.431710551240765e-06,
61
+ "loss": 0.9152,
62
+ "step": 4500
63
+ },
64
+ {
65
+ "epoch": 0.19,
66
+ "learning_rate": 9.368567279156406e-06,
67
+ "loss": 0.897,
68
+ "step": 5000
69
+ },
70
+ {
71
+ "epoch": 0.19,
72
+ "eval_loss": 1.8563896417617798,
73
+ "eval_runtime": 4.2841,
74
+ "eval_samples_per_second": 116.71,
75
+ "eval_steps_per_second": 14.706,
76
+ "step": 5000
77
+ },
78
+ {
79
+ "epoch": 0.21,
80
+ "learning_rate": 9.305424007072047e-06,
81
+ "loss": 0.879,
82
+ "step": 5500
83
+ },
84
+ {
85
+ "epoch": 0.23,
86
+ "learning_rate": 9.242280734987688e-06,
87
+ "loss": 0.8759,
88
+ "step": 6000
89
+ },
90
+ {
91
+ "epoch": 0.25,
92
+ "learning_rate": 9.179137462903327e-06,
93
+ "loss": 0.8799,
94
+ "step": 6500
95
+ },
96
+ {
97
+ "epoch": 0.27,
98
+ "learning_rate": 9.115994190818968e-06,
99
+ "loss": 0.8683,
100
+ "step": 7000
101
+ },
102
+ {
103
+ "epoch": 0.28,
104
+ "learning_rate": 9.052850918734609e-06,
105
+ "loss": 0.8598,
106
+ "step": 7500
107
+ },
108
+ {
109
+ "epoch": 0.3,
110
+ "learning_rate": 8.98970764665025e-06,
111
+ "loss": 0.8662,
112
+ "step": 8000
113
+ },
114
+ {
115
+ "epoch": 0.32,
116
+ "learning_rate": 8.92656437456589e-06,
117
+ "loss": 0.8528,
118
+ "step": 8500
119
+ },
120
+ {
121
+ "epoch": 0.34,
122
+ "learning_rate": 8.863421102481532e-06,
123
+ "loss": 0.8444,
124
+ "step": 9000
125
+ },
126
+ {
127
+ "epoch": 0.36,
128
+ "learning_rate": 8.800277830397171e-06,
129
+ "loss": 0.8632,
130
+ "step": 9500
131
+ },
132
+ {
133
+ "epoch": 0.38,
134
+ "learning_rate": 8.737134558312812e-06,
135
+ "loss": 0.852,
136
+ "step": 10000
137
+ },
138
+ {
139
+ "epoch": 0.38,
140
+ "eval_loss": 1.813950777053833,
141
+ "eval_runtime": 4.2949,
142
+ "eval_samples_per_second": 116.417,
143
+ "eval_steps_per_second": 14.669,
144
+ "step": 10000
145
+ },
146
+ {
147
+ "epoch": 0.4,
148
+ "learning_rate": 8.673991286228453e-06,
149
+ "loss": 0.8545,
150
+ "step": 10500
151
+ },
152
+ {
153
+ "epoch": 0.42,
154
+ "learning_rate": 8.610848014144094e-06,
155
+ "loss": 0.8312,
156
+ "step": 11000
157
+ },
158
+ {
159
+ "epoch": 0.44,
160
+ "learning_rate": 8.547704742059734e-06,
161
+ "loss": 0.84,
162
+ "step": 11500
163
+ },
164
+ {
165
+ "epoch": 0.45,
166
+ "learning_rate": 8.484561469975375e-06,
167
+ "loss": 0.8304,
168
+ "step": 12000
169
+ },
170
+ {
171
+ "epoch": 0.47,
172
+ "learning_rate": 8.421418197891016e-06,
173
+ "loss": 0.83,
174
+ "step": 12500
175
+ },
176
+ {
177
+ "epoch": 0.49,
178
+ "learning_rate": 8.358274925806657e-06,
179
+ "loss": 0.7981,
180
+ "step": 13000
181
+ },
182
+ {
183
+ "epoch": 0.51,
184
+ "learning_rate": 8.295131653722296e-06,
185
+ "loss": 0.8254,
186
+ "step": 13500
187
+ },
188
+ {
189
+ "epoch": 0.53,
190
+ "learning_rate": 8.231988381637937e-06,
191
+ "loss": 0.8275,
192
+ "step": 14000
193
+ },
194
+ {
195
+ "epoch": 0.55,
196
+ "learning_rate": 8.168845109553578e-06,
197
+ "loss": 0.8141,
198
+ "step": 14500
199
+ },
200
+ {
201
+ "epoch": 0.57,
202
+ "learning_rate": 8.105701837469219e-06,
203
+ "loss": 0.8096,
204
+ "step": 15000
205
+ },
206
+ {
207
+ "epoch": 0.57,
208
+ "eval_loss": 1.7879245281219482,
209
+ "eval_runtime": 4.3051,
210
+ "eval_samples_per_second": 116.142,
211
+ "eval_steps_per_second": 14.634,
212
+ "step": 15000
213
+ },
214
+ {
215
+ "epoch": 0.59,
216
+ "learning_rate": 8.04255856538486e-06,
217
+ "loss": 0.7986,
218
+ "step": 15500
219
+ },
220
+ {
221
+ "epoch": 0.61,
222
+ "learning_rate": 7.979415293300499e-06,
223
+ "loss": 0.8069,
224
+ "step": 16000
225
+ },
226
+ {
227
+ "epoch": 0.63,
228
+ "learning_rate": 7.91627202121614e-06,
229
+ "loss": 0.8343,
230
+ "step": 16500
231
+ },
232
+ {
233
+ "epoch": 0.64,
234
+ "learning_rate": 7.85312874913178e-06,
235
+ "loss": 0.8069,
236
+ "step": 17000
237
+ },
238
+ {
239
+ "epoch": 0.66,
240
+ "learning_rate": 7.789985477047422e-06,
241
+ "loss": 0.8089,
242
+ "step": 17500
243
+ },
244
+ {
245
+ "epoch": 0.68,
246
+ "learning_rate": 7.726842204963063e-06,
247
+ "loss": 0.8144,
248
+ "step": 18000
249
+ },
250
+ {
251
+ "epoch": 0.7,
252
+ "learning_rate": 7.663698932878702e-06,
253
+ "loss": 0.7886,
254
+ "step": 18500
255
+ },
256
+ {
257
+ "epoch": 0.72,
258
+ "learning_rate": 7.6005556607943435e-06,
259
+ "loss": 0.7968,
260
+ "step": 19000
261
+ },
262
+ {
263
+ "epoch": 0.74,
264
+ "learning_rate": 7.5374123887099835e-06,
265
+ "loss": 0.7836,
266
+ "step": 19500
267
+ },
268
+ {
269
+ "epoch": 0.76,
270
+ "learning_rate": 7.474269116625624e-06,
271
+ "loss": 0.8243,
272
+ "step": 20000
273
+ },
274
+ {
275
+ "epoch": 0.76,
276
+ "eval_loss": 1.7740414142608643,
277
+ "eval_runtime": 4.2937,
278
+ "eval_samples_per_second": 116.45,
279
+ "eval_steps_per_second": 14.673,
280
+ "step": 20000
281
+ },
282
+ {
283
+ "epoch": 0.78,
284
+ "learning_rate": 7.4111258445412644e-06,
285
+ "loss": 0.7716,
286
+ "step": 20500
287
+ },
288
+ {
289
+ "epoch": 0.8,
290
+ "learning_rate": 7.347982572456905e-06,
291
+ "loss": 0.7921,
292
+ "step": 21000
293
+ },
294
+ {
295
+ "epoch": 0.81,
296
+ "learning_rate": 7.284839300372546e-06,
297
+ "loss": 0.7943,
298
+ "step": 21500
299
+ },
300
+ {
301
+ "epoch": 0.83,
302
+ "learning_rate": 7.221696028288186e-06,
303
+ "loss": 0.7644,
304
+ "step": 22000
305
+ },
306
+ {
307
+ "epoch": 0.85,
308
+ "learning_rate": 7.158552756203827e-06,
309
+ "loss": 0.7841,
310
+ "step": 22500
311
+ },
312
+ {
313
+ "epoch": 0.87,
314
+ "learning_rate": 7.095409484119468e-06,
315
+ "loss": 0.7866,
316
+ "step": 23000
317
+ },
318
+ {
319
+ "epoch": 0.89,
320
+ "learning_rate": 7.032266212035108e-06,
321
+ "loss": 0.7967,
322
+ "step": 23500
323
+ },
324
+ {
325
+ "epoch": 0.91,
326
+ "learning_rate": 6.969122939950749e-06,
327
+ "loss": 0.7776,
328
+ "step": 24000
329
+ },
330
+ {
331
+ "epoch": 0.93,
332
+ "learning_rate": 6.905979667866389e-06,
333
+ "loss": 0.7704,
334
+ "step": 24500
335
+ },
336
+ {
337
+ "epoch": 0.95,
338
+ "learning_rate": 6.84283639578203e-06,
339
+ "loss": 0.7536,
340
+ "step": 25000
341
+ },
342
+ {
343
+ "epoch": 0.95,
344
+ "eval_loss": 1.7646362781524658,
345
+ "eval_runtime": 4.2984,
346
+ "eval_samples_per_second": 116.323,
347
+ "eval_steps_per_second": 14.657,
348
+ "step": 25000
349
+ },
350
+ {
351
+ "epoch": 0.97,
352
+ "learning_rate": 6.779693123697671e-06,
353
+ "loss": 0.7886,
354
+ "step": 25500
355
+ },
356
+ {
357
+ "epoch": 0.99,
358
+ "learning_rate": 6.716549851613311e-06,
359
+ "loss": 0.7812,
360
+ "step": 26000
361
+ },
362
+ {
363
+ "epoch": 1.0,
364
+ "learning_rate": 6.653406579528952e-06,
365
+ "loss": 0.7845,
366
+ "step": 26500
367
+ },
368
+ {
369
+ "epoch": 1.02,
370
+ "learning_rate": 6.590263307444592e-06,
371
+ "loss": 0.7634,
372
+ "step": 27000
373
+ },
374
+ {
375
+ "epoch": 1.04,
376
+ "learning_rate": 6.527120035360233e-06,
377
+ "loss": 0.7592,
378
+ "step": 27500
379
+ },
380
+ {
381
+ "epoch": 1.06,
382
+ "learning_rate": 6.4639767632758735e-06,
383
+ "loss": 0.7494,
384
+ "step": 28000
385
+ },
386
+ {
387
+ "epoch": 1.08,
388
+ "learning_rate": 6.4008334911915135e-06,
389
+ "loss": 0.749,
390
+ "step": 28500
391
+ },
392
+ {
393
+ "epoch": 1.1,
394
+ "learning_rate": 6.3376902191071544e-06,
395
+ "loss": 0.7592,
396
+ "step": 29000
397
+ },
398
+ {
399
+ "epoch": 1.12,
400
+ "learning_rate": 6.2745469470227945e-06,
401
+ "loss": 0.7355,
402
+ "step": 29500
403
+ },
404
+ {
405
+ "epoch": 1.14,
406
+ "learning_rate": 6.211403674938435e-06,
407
+ "loss": 0.7442,
408
+ "step": 30000
409
+ },
410
+ {
411
+ "epoch": 1.14,
412
+ "eval_loss": 1.762963891029358,
413
+ "eval_runtime": 4.2966,
414
+ "eval_samples_per_second": 116.372,
415
+ "eval_steps_per_second": 14.663,
416
+ "step": 30000
417
+ },
418
+ {
419
+ "epoch": 1.16,
420
+ "learning_rate": 6.148260402854076e-06,
421
+ "loss": 0.7629,
422
+ "step": 30500
423
+ },
424
+ {
425
+ "epoch": 1.17,
426
+ "learning_rate": 6.085117130769716e-06,
427
+ "loss": 0.7227,
428
+ "step": 31000
429
+ },
430
+ {
431
+ "epoch": 1.19,
432
+ "learning_rate": 6.021973858685357e-06,
433
+ "loss": 0.738,
434
+ "step": 31500
435
+ },
436
+ {
437
+ "epoch": 1.21,
438
+ "learning_rate": 5.958830586600998e-06,
439
+ "loss": 0.7293,
440
+ "step": 32000
441
+ },
442
+ {
443
+ "epoch": 1.23,
444
+ "learning_rate": 5.895687314516638e-06,
445
+ "loss": 0.7434,
446
+ "step": 32500
447
+ },
448
+ {
449
+ "epoch": 1.25,
450
+ "learning_rate": 5.832544042432279e-06,
451
+ "loss": 0.7548,
452
+ "step": 33000
453
+ },
454
+ {
455
+ "epoch": 1.27,
456
+ "learning_rate": 5.769400770347921e-06,
457
+ "loss": 0.7304,
458
+ "step": 33500
459
+ },
460
+ {
461
+ "epoch": 1.29,
462
+ "learning_rate": 5.706257498263561e-06,
463
+ "loss": 0.7331,
464
+ "step": 34000
465
+ },
466
+ {
467
+ "epoch": 1.31,
468
+ "learning_rate": 5.643114226179202e-06,
469
+ "loss": 0.7387,
470
+ "step": 34500
471
+ },
472
+ {
473
+ "epoch": 1.33,
474
+ "learning_rate": 5.5799709540948425e-06,
475
+ "loss": 0.7482,
476
+ "step": 35000
477
+ },
478
+ {
479
+ "epoch": 1.33,
480
+ "eval_loss": 1.7577964067459106,
481
+ "eval_runtime": 4.9786,
482
+ "eval_samples_per_second": 100.43,
483
+ "eval_steps_per_second": 12.654,
484
+ "step": 35000
485
+ },
486
+ {
487
+ "epoch": 1.34,
488
+ "learning_rate": 5.5168276820104826e-06,
489
+ "loss": 0.7177,
490
+ "step": 35500
491
+ },
492
+ {
493
+ "epoch": 1.36,
494
+ "learning_rate": 5.4536844099261234e-06,
495
+ "loss": 0.7227,
496
+ "step": 36000
497
+ },
498
+ {
499
+ "epoch": 1.38,
500
+ "learning_rate": 5.3905411378417635e-06,
501
+ "loss": 0.723,
502
+ "step": 36500
503
+ },
504
+ {
505
+ "epoch": 1.4,
506
+ "learning_rate": 5.327397865757404e-06,
507
+ "loss": 0.7255,
508
+ "step": 37000
509
+ },
510
+ {
511
+ "epoch": 1.42,
512
+ "learning_rate": 5.264254593673045e-06,
513
+ "loss": 0.7269,
514
+ "step": 37500
515
+ },
516
+ {
517
+ "epoch": 1.44,
518
+ "learning_rate": 5.201111321588685e-06,
519
+ "loss": 0.7528,
520
+ "step": 38000
521
+ },
522
+ {
523
+ "epoch": 1.46,
524
+ "learning_rate": 5.137968049504326e-06,
525
+ "loss": 0.7377,
526
+ "step": 38500
527
+ },
528
+ {
529
+ "epoch": 1.48,
530
+ "learning_rate": 5.074824777419966e-06,
531
+ "loss": 0.7317,
532
+ "step": 39000
533
+ },
534
+ {
535
+ "epoch": 1.5,
536
+ "learning_rate": 5.011681505335607e-06,
537
+ "loss": 0.7486,
538
+ "step": 39500
539
+ },
540
+ {
541
+ "epoch": 1.52,
542
+ "learning_rate": 4.948538233251248e-06,
543
+ "loss": 0.7126,
544
+ "step": 40000
545
+ },
546
+ {
547
+ "epoch": 1.52,
548
+ "eval_loss": 1.7534527778625488,
549
+ "eval_runtime": 4.2918,
550
+ "eval_samples_per_second": 116.502,
551
+ "eval_steps_per_second": 14.679,
552
+ "step": 40000
553
+ },
554
+ {
555
+ "epoch": 1.53,
556
+ "learning_rate": 4.885394961166888e-06,
557
+ "loss": 0.7211,
558
+ "step": 40500
559
+ },
560
+ {
561
+ "epoch": 1.55,
562
+ "learning_rate": 4.822251689082529e-06,
563
+ "loss": 0.7059,
564
+ "step": 41000
565
+ },
566
+ {
567
+ "epoch": 1.57,
568
+ "learning_rate": 4.75910841699817e-06,
569
+ "loss": 0.7331,
570
+ "step": 41500
571
+ },
572
+ {
573
+ "epoch": 1.59,
574
+ "learning_rate": 4.69596514491381e-06,
575
+ "loss": 0.7199,
576
+ "step": 42000
577
+ },
578
+ {
579
+ "epoch": 1.61,
580
+ "learning_rate": 4.632821872829451e-06,
581
+ "loss": 0.7026,
582
+ "step": 42500
583
+ },
584
+ {
585
+ "epoch": 1.63,
586
+ "learning_rate": 4.569678600745091e-06,
587
+ "loss": 0.7181,
588
+ "step": 43000
589
+ },
590
+ {
591
+ "epoch": 1.65,
592
+ "learning_rate": 4.506535328660732e-06,
593
+ "loss": 0.725,
594
+ "step": 43500
595
+ },
596
+ {
597
+ "epoch": 1.67,
598
+ "learning_rate": 4.4433920565763725e-06,
599
+ "loss": 0.7281,
600
+ "step": 44000
601
+ },
602
+ {
603
+ "epoch": 1.69,
604
+ "learning_rate": 4.380248784492013e-06,
605
+ "loss": 0.714,
606
+ "step": 44500
607
+ },
608
+ {
609
+ "epoch": 1.7,
610
+ "learning_rate": 4.3171055124076535e-06,
611
+ "loss": 0.7123,
612
+ "step": 45000
613
+ },
614
+ {
615
+ "epoch": 1.7,
616
+ "eval_loss": 1.7550365924835205,
617
+ "eval_runtime": 4.9628,
618
+ "eval_samples_per_second": 100.749,
619
+ "eval_steps_per_second": 12.694,
620
+ "step": 45000
621
+ },
622
+ {
623
+ "epoch": 1.72,
624
+ "learning_rate": 4.2539622403232935e-06,
625
+ "loss": 0.7039,
626
+ "step": 45500
627
+ },
628
+ {
629
+ "epoch": 1.74,
630
+ "learning_rate": 4.190818968238934e-06,
631
+ "loss": 0.7207,
632
+ "step": 46000
633
+ },
634
+ {
635
+ "epoch": 1.76,
636
+ "learning_rate": 4.127675696154575e-06,
637
+ "loss": 0.7245,
638
+ "step": 46500
639
+ },
640
+ {
641
+ "epoch": 1.78,
642
+ "learning_rate": 4.064532424070215e-06,
643
+ "loss": 0.7058,
644
+ "step": 47000
645
+ },
646
+ {
647
+ "epoch": 1.8,
648
+ "learning_rate": 4.001389151985856e-06,
649
+ "loss": 0.7216,
650
+ "step": 47500
651
+ },
652
+ {
653
+ "epoch": 1.82,
654
+ "learning_rate": 3.938245879901496e-06,
655
+ "loss": 0.7339,
656
+ "step": 48000
657
+ },
658
+ {
659
+ "epoch": 1.84,
660
+ "learning_rate": 3.875102607817137e-06,
661
+ "loss": 0.7354,
662
+ "step": 48500
663
+ },
664
+ {
665
+ "epoch": 1.86,
666
+ "learning_rate": 3.8119593357327776e-06,
667
+ "loss": 0.7281,
668
+ "step": 49000
669
+ },
670
+ {
671
+ "epoch": 1.88,
672
+ "learning_rate": 3.7488160636484185e-06,
673
+ "loss": 0.7297,
674
+ "step": 49500
675
+ },
676
+ {
677
+ "epoch": 1.89,
678
+ "learning_rate": 3.685672791564059e-06,
679
+ "loss": 0.7211,
680
+ "step": 50000
681
+ },
682
+ {
683
+ "epoch": 1.89,
684
+ "eval_loss": 1.7511944770812988,
685
+ "eval_runtime": 4.9613,
686
+ "eval_samples_per_second": 100.78,
687
+ "eval_steps_per_second": 12.698,
688
+ "step": 50000
689
+ },
690
+ {
691
+ "epoch": 1.91,
692
+ "learning_rate": 3.6225295194797e-06,
693
+ "loss": 0.7116,
694
+ "step": 50500
695
+ },
696
+ {
697
+ "epoch": 1.93,
698
+ "learning_rate": 3.5593862473953407e-06,
699
+ "loss": 0.7168,
700
+ "step": 51000
701
+ },
702
+ {
703
+ "epoch": 1.95,
704
+ "learning_rate": 3.496242975310981e-06,
705
+ "loss": 0.7073,
706
+ "step": 51500
707
+ },
708
+ {
709
+ "epoch": 1.97,
710
+ "learning_rate": 3.4330997032266216e-06,
711
+ "loss": 0.7085,
712
+ "step": 52000
713
+ },
714
+ {
715
+ "epoch": 1.99,
716
+ "learning_rate": 3.369956431142262e-06,
717
+ "loss": 0.7143,
718
+ "step": 52500
719
+ },
720
+ {
721
+ "epoch": 2.01,
722
+ "learning_rate": 3.306813159057903e-06,
723
+ "loss": 0.7134,
724
+ "step": 53000
725
+ },
726
+ {
727
+ "epoch": 2.03,
728
+ "learning_rate": 3.2436698869735434e-06,
729
+ "loss": 0.6899,
730
+ "step": 53500
731
+ },
732
+ {
733
+ "epoch": 2.05,
734
+ "learning_rate": 3.180526614889184e-06,
735
+ "loss": 0.6816,
736
+ "step": 54000
737
+ },
738
+ {
739
+ "epoch": 2.06,
740
+ "learning_rate": 3.1173833428048244e-06,
741
+ "loss": 0.7157,
742
+ "step": 54500
743
+ },
744
+ {
745
+ "epoch": 2.08,
746
+ "learning_rate": 3.054240070720465e-06,
747
+ "loss": 0.6791,
748
+ "step": 55000
749
+ },
750
+ {
751
+ "epoch": 2.08,
752
+ "eval_loss": 1.7588887214660645,
753
+ "eval_runtime": 4.9575,
754
+ "eval_samples_per_second": 100.856,
755
+ "eval_steps_per_second": 12.708,
756
+ "step": 55000
757
+ },
758
+ {
759
+ "epoch": 2.1,
760
+ "learning_rate": 2.9910967986361057e-06,
761
+ "loss": 0.693,
762
+ "step": 55500
763
+ },
764
+ {
765
+ "epoch": 2.12,
766
+ "learning_rate": 2.927953526551746e-06,
767
+ "loss": 0.6793,
768
+ "step": 56000
769
+ },
770
+ {
771
+ "epoch": 2.14,
772
+ "learning_rate": 2.8648102544673866e-06,
773
+ "loss": 0.6825,
774
+ "step": 56500
775
+ },
776
+ {
777
+ "epoch": 2.16,
778
+ "learning_rate": 2.801666982383027e-06,
779
+ "loss": 0.6897,
780
+ "step": 57000
781
+ },
782
+ {
783
+ "epoch": 2.18,
784
+ "learning_rate": 2.738523710298668e-06,
785
+ "loss": 0.6834,
786
+ "step": 57500
787
+ },
788
+ {
789
+ "epoch": 2.2,
790
+ "learning_rate": 2.6753804382143085e-06,
791
+ "loss": 0.6959,
792
+ "step": 58000
793
+ },
794
+ {
795
+ "epoch": 2.22,
796
+ "learning_rate": 2.612237166129949e-06,
797
+ "loss": 0.6977,
798
+ "step": 58500
799
+ },
800
+ {
801
+ "epoch": 2.24,
802
+ "learning_rate": 2.5490938940455894e-06,
803
+ "loss": 0.6881,
804
+ "step": 59000
805
+ },
806
+ {
807
+ "epoch": 2.25,
808
+ "learning_rate": 2.4859506219612303e-06,
809
+ "loss": 0.7066,
810
+ "step": 59500
811
+ },
812
+ {
813
+ "epoch": 2.27,
814
+ "learning_rate": 2.4228073498768707e-06,
815
+ "loss": 0.6731,
816
+ "step": 60000
817
+ },
818
+ {
819
+ "epoch": 2.27,
820
+ "eval_loss": 1.7606250047683716,
821
+ "eval_runtime": 4.9678,
822
+ "eval_samples_per_second": 100.649,
823
+ "eval_steps_per_second": 12.682,
824
+ "step": 60000
825
+ },
826
+ {
827
+ "epoch": 2.29,
828
+ "learning_rate": 2.3596640777925116e-06,
829
+ "loss": 0.6904,
830
+ "step": 60500
831
+ },
832
+ {
833
+ "epoch": 2.31,
834
+ "learning_rate": 2.296520805708152e-06,
835
+ "loss": 0.6873,
836
+ "step": 61000
837
+ },
838
+ {
839
+ "epoch": 2.33,
840
+ "learning_rate": 2.2333775336237925e-06,
841
+ "loss": 0.701,
842
+ "step": 61500
843
+ },
844
+ {
845
+ "epoch": 2.35,
846
+ "learning_rate": 2.170234261539433e-06,
847
+ "loss": 0.6964,
848
+ "step": 62000
849
+ },
850
+ {
851
+ "epoch": 2.37,
852
+ "learning_rate": 2.1070909894550735e-06,
853
+ "loss": 0.7201,
854
+ "step": 62500
855
+ },
856
+ {
857
+ "epoch": 2.39,
858
+ "learning_rate": 2.0439477173707144e-06,
859
+ "loss": 0.6866,
860
+ "step": 63000
861
+ },
862
+ {
863
+ "epoch": 2.41,
864
+ "learning_rate": 1.980804445286355e-06,
865
+ "loss": 0.6929,
866
+ "step": 63500
867
+ },
868
+ {
869
+ "epoch": 2.42,
870
+ "learning_rate": 1.9176611732019953e-06,
871
+ "loss": 0.6939,
872
+ "step": 64000
873
+ },
874
+ {
875
+ "epoch": 2.44,
876
+ "learning_rate": 1.8545179011176362e-06,
877
+ "loss": 0.701,
878
+ "step": 64500
879
+ },
880
+ {
881
+ "epoch": 2.46,
882
+ "learning_rate": 1.7913746290332768e-06,
883
+ "loss": 0.682,
884
+ "step": 65000
885
+ },
886
+ {
887
+ "epoch": 2.46,
888
+ "eval_loss": 1.7586958408355713,
889
+ "eval_runtime": 4.3201,
890
+ "eval_samples_per_second": 115.738,
891
+ "eval_steps_per_second": 14.583,
892
+ "step": 65000
893
+ },
894
+ {
895
+ "epoch": 2.48,
896
+ "learning_rate": 1.7282313569489173e-06,
897
+ "loss": 0.6694,
898
+ "step": 65500
899
+ },
900
+ {
901
+ "epoch": 2.5,
902
+ "learning_rate": 1.6650880848645578e-06,
903
+ "loss": 0.7003,
904
+ "step": 66000
905
+ },
906
+ {
907
+ "epoch": 2.52,
908
+ "learning_rate": 1.6019448127801984e-06,
909
+ "loss": 0.6837,
910
+ "step": 66500
911
+ },
912
+ {
913
+ "epoch": 2.54,
914
+ "learning_rate": 1.538801540695839e-06,
915
+ "loss": 0.7013,
916
+ "step": 67000
917
+ },
918
+ {
919
+ "epoch": 2.56,
920
+ "learning_rate": 1.4756582686114796e-06,
921
+ "loss": 0.6882,
922
+ "step": 67500
923
+ },
924
+ {
925
+ "epoch": 2.58,
926
+ "learning_rate": 1.41251499652712e-06,
927
+ "loss": 0.6961,
928
+ "step": 68000
929
+ },
930
+ {
931
+ "epoch": 2.6,
932
+ "learning_rate": 1.3493717244427607e-06,
933
+ "loss": 0.6827,
934
+ "step": 68500
935
+ },
936
+ {
937
+ "epoch": 2.61,
938
+ "learning_rate": 1.2862284523584012e-06,
939
+ "loss": 0.6844,
940
+ "step": 69000
941
+ },
942
+ {
943
+ "epoch": 2.63,
944
+ "learning_rate": 1.2230851802740419e-06,
945
+ "loss": 0.6992,
946
+ "step": 69500
947
+ },
948
+ {
949
+ "epoch": 2.65,
950
+ "learning_rate": 1.1599419081896825e-06,
951
+ "loss": 0.6857,
952
+ "step": 70000
953
+ },
954
+ {
955
+ "epoch": 2.65,
956
+ "eval_loss": 1.7555959224700928,
957
+ "eval_runtime": 4.9634,
958
+ "eval_samples_per_second": 100.737,
959
+ "eval_steps_per_second": 12.693,
960
+ "step": 70000
961
+ },
962
+ {
963
+ "epoch": 2.67,
964
+ "learning_rate": 1.096798636105323e-06,
965
+ "loss": 0.6833,
966
+ "step": 70500
967
+ },
968
+ {
969
+ "epoch": 2.69,
970
+ "learning_rate": 1.0336553640209637e-06,
971
+ "loss": 0.6754,
972
+ "step": 71000
973
+ },
974
+ {
975
+ "epoch": 2.71,
976
+ "learning_rate": 9.705120919366043e-07,
977
+ "loss": 0.675,
978
+ "step": 71500
979
+ },
980
+ {
981
+ "epoch": 2.73,
982
+ "learning_rate": 9.073688198522448e-07,
983
+ "loss": 0.6849,
984
+ "step": 72000
985
+ },
986
+ {
987
+ "epoch": 2.75,
988
+ "learning_rate": 8.442255477678854e-07,
989
+ "loss": 0.6903,
990
+ "step": 72500
991
+ },
992
+ {
993
+ "epoch": 2.77,
994
+ "learning_rate": 7.810822756835259e-07,
995
+ "loss": 0.6832,
996
+ "step": 73000
997
+ },
998
+ {
999
+ "epoch": 2.78,
1000
+ "learning_rate": 7.179390035991665e-07,
1001
+ "loss": 0.6823,
1002
+ "step": 73500
1003
+ },
1004
+ {
1005
+ "epoch": 2.8,
1006
+ "learning_rate": 6.547957315148072e-07,
1007
+ "loss": 0.6854,
1008
+ "step": 74000
1009
+ },
1010
+ {
1011
+ "epoch": 2.82,
1012
+ "learning_rate": 5.916524594304478e-07,
1013
+ "loss": 0.681,
1014
+ "step": 74500
1015
+ },
1016
+ {
1017
+ "epoch": 2.84,
1018
+ "learning_rate": 5.285091873460883e-07,
1019
+ "loss": 0.6806,
1020
+ "step": 75000
1021
+ },
1022
+ {
1023
+ "epoch": 2.84,
1024
+ "eval_loss": 1.7584295272827148,
1025
+ "eval_runtime": 4.313,
1026
+ "eval_samples_per_second": 115.929,
1027
+ "eval_steps_per_second": 14.607,
1028
+ "step": 75000
1029
+ }
1030
+ ],
1031
+ "max_steps": 79185,
1032
+ "num_train_epochs": 3,
1033
+ "total_flos": 1.939234887273808e+18,
1034
+ "trial_name": null,
1035
+ "trial_params": null
1036
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3cf757e80970ef6259c37f0cf6ad11af6969ec9ebab554e228573405a1bae239
3
+ size 3183