ce-lery commited on
Commit
c783850
1 Parent(s): 8322c34

feat: pretrained by recipe v0.1.0

Browse files
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: None
3
+ tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: checkpoints-mistral-300M-FA2
7
+ results: []
8
+ ---
9
+
10
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
11
+ should probably proofread and complete it, then remove this comment. -->
12
+
13
+ # japanese-mistral-300m-base
14
+
15
+ ## Overview
16
+
17
+ Welcome to my model card!
18
+
19
+ This Model feature is ...
20
+
21
+ - Suppression of unknown word generation by using byte fallback in SentencePiece tokenizer and conversion to huggingface Tokenizers format
22
+ - Pretrained by wikipedia dataset and cc100 dataset
23
+ - Use of [Mistral 300M](confing.json)
24
+
25
+ Yukkuri shite ittene!
26
+
27
+ ## How to use the model
28
+
29
+ ```python
30
+ from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
31
+ import torch
32
+
33
+ MODEL_NAME = "ce-lery/japanese-mistral-300m-base"
34
+ torch.set_float32_matmul_precision('high')
35
+
36
+ DEVICE = "cuda"
37
+ if torch.cuda.is_available():
38
+ print("cuda")
39
+ DEVICE = "cuda"
40
+ else:
41
+ print("cpu")
42
+ DEVICE = "cpu"
43
+
44
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME,use_fast=False)
45
+ model = AutoModelForCausalLM.from_pretrained(
46
+ MODEL_NAME,
47
+ trust_remote_code=True,
48
+ ).to(DEVICE)
49
+
50
+ # streamer = TextStreamer(tokenizer)
51
+
52
+ prompt = "大規模言語モデルとは、"
53
+
54
+ inputs = tokenizer(prompt, add_special_tokens=False,return_tensors="pt").to(model.device)
55
+ with torch.no_grad():
56
+
57
+ outputs = model.generate(
58
+ inputs["input_ids"],
59
+ max_new_tokens=256,
60
+ do_sample=True,
61
+ early_stopping=False,
62
+ top_p=0.95,
63
+ top_k=50,
64
+ temperature=0.9,
65
+ # streamer=streamer,
66
+ no_repeat_ngram_size=2,
67
+ num_beams=3
68
+ )
69
+
70
+ print(outputs.tolist()[0])
71
+ outputs_txt = tokenizer.decode(outputs[0])
72
+ print(outputs_txt)
73
+
74
+ ```
75
+
76
+ ## Receipe
77
+
78
+ If you want to restruct this model, you can refer [this Github repository](https://github.com/ce-lery/japanese-mistral-300m-recipe).
79
+
80
+ I wrote the receipe for struction this model. For example,
81
+
82
+ - Preprocess with sentencepiece
83
+ - Pretraining with flash attention2 and torch.compile and DeepSpeed
84
+ - Fine-tuning with databricks-dolly-15k-ja
85
+
86
+ If you find my mistake,error,...etc, please create issue.
87
+ If you create pulreqest, I'm very happy!
88
+
89
+ ## Training procedure
90
+
91
+ ### Training hyperparameters
92
+
93
+ The following hyperparameters were used during training:
94
+ - learning_rate: 0.0006
95
+ - train_batch_size: 4
96
+ - eval_batch_size: 4
97
+ - seed: 42
98
+ - distributed_type: multi-GPU
99
+ - gradient_accumulation_steps: 64
100
+ - total_train_batch_size: 256
101
+ - optimizer: Adam with betas=(0.9,0.95) and epsilon=0.0001
102
+ - lr_scheduler_type: cosine
103
+ - lr_scheduler_warmup_steps: 1000
104
+ - num_epochs: 1
105
+ - mixed_precision_training: Native AMP
106
+
107
+ ### Training results
108
+
109
+ | Training Loss | Epoch | Step | Validation Loss |
110
+ |:-------------:|:-----:|:-----:|:---------------:|
111
+ | 4.2911 | 0.12 | 5000 | 4.2914 |
112
+ | 3.9709 | 0.24 | 10000 | 3.9900 |
113
+ | 3.8229 | 0.36 | 15000 | 3.8388 |
114
+ | 3.7197 | 0.47 | 20000 | 3.7454 |
115
+ | 3.652 | 0.59 | 25000 | 3.6739 |
116
+ | 3.597 | 0.71 | 30000 | 3.6177 |
117
+ | 3.5554 | 0.83 | 35000 | 3.5770 |
118
+ | 3.536 | 0.95 | 40000 | 3.5582 |
119
+
120
+
121
+ ### Framework versions
122
+
123
+ - Transformers 4.35.2
124
+ - Pytorch 2.1.1+cu121
125
+ - Datasets 2.14.5
126
+ - Tokenizers 0.14.1
all_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_loss": 3.5582468509674072,
4
+ "eval_runtime": 6274.8366,
5
+ "eval_samples": 551057,
6
+ "eval_samples_per_second": 87.82,
7
+ "eval_steps_per_second": 21.955,
8
+ "perplexity": 35.10160482608155,
9
+ "train_loss": 3.89913355111991,
10
+ "train_runtime": 393554.9634,
11
+ "train_samples": 10794765,
12
+ "train_samples_per_second": 27.429,
13
+ "train_steps_per_second": 0.107
14
+ }
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "None",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "bos_token_id": 0,
7
+ "eos_token_id": 0,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 1024,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 2400,
12
+ "max_position_embeddings": 4096,
13
+ "model_type": "mistral",
14
+ "num_attention_heads": 16,
15
+ "num_hidden_layers": 24,
16
+ "num_key_value_heads": 8,
17
+ "rms_norm_eps": 1e-05,
18
+ "rope_theta": 10000.0,
19
+ "sliding_window": 1024,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.35.2",
23
+ "use_cache": true,
24
+ "vocab_size": 50257
25
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "eval_loss": 3.5582468509674072,
4
+ "eval_runtime": 6274.8366,
5
+ "eval_samples": 551057,
6
+ "eval_samples_per_second": 87.82,
7
+ "eval_steps_per_second": 21.955,
8
+ "perplexity": 35.10160482608155
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.35.2"
6
+ }
logs/events.out.tfevents.1701268638.6c82343ebf86.774334.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69b2334be418749fdef9ee5067eab27f2d2bc87af0f54c7c07b034c89b2eef03
3
+ size 73595
logs/events.out.tfevents.1701668470.6c82343ebf86.774334.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:46102452189de1ad0a6580e9a1069d92145f83074cb29f286637e304ac05bc73
3
+ size 364
logs/events.out.tfevents.1702138508.90c313ded1af.10650.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14c7f3f9ae268ac956a8838195d16ba9f20df5c0b60d68fc79294bca4f530836
3
+ size 9990
logs/events.out.tfevents.1702169068.90c313ded1af.1917.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0b11071131d2855028e7354d33e2d839ee63b64555ced6a8dd5954eaf8a5dd3
3
+ size 6762
logs/events.out.tfevents.1702194187.90c313ded1af.463702.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a44da5d117fc008e9800ea4e20c3ffdc008f1535e87c3f6cac5c3a79e1a9d761
3
+ size 4184
logs/events.out.tfevents.1702195605.90c313ded1af.487106.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3dfaddfc43e7e474eb78e56117557638bd33bbb5648e494510efd8dfa101e1d5
3
+ size 4184
logs/events.out.tfevents.1702196577.90c313ded1af.501706.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9db84ede883d6200a3cf6dcb189eeaf94b03ff9ee07b14207ddd4d1d0aaf574
3
+ size 4186
logs/events.out.tfevents.1702198797.90c313ded1af.526008.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:054a78baa0e27bb561455b02c3c8dba11b989f1b5ce382502cbf8df04c52f48e
3
+ size 6197
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:669198b0c1741e7f451ee08ac754a73e821b2156372b5bc10d55731c5f60534f
3
+ size 1421709600
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "[UNK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:604cb0c2f073ba13f04739ced6f8310f4f00ab344feea6cb5c4012af3876c684
3
+ size 1249735
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "<s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "</s>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ }
35
+ },
36
+ "additional_special_tokens": [],
37
+ "bos_token": "<s>",
38
+ "clean_up_tokenization_spaces": true,
39
+ "eos_token": "</s>",
40
+ "extra_ids": 0,
41
+ "legacy": true,
42
+ "model_max_length": 50000,
43
+ "pad_token": "[PAD]",
44
+ "sp_model_kwargs": {},
45
+ "tokenizer_class": "T5Tokenizer",
46
+ "unk_token": "[UNK]"
47
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 1.0,
3
+ "train_loss": 3.89913355111991,
4
+ "train_runtime": 393554.9634,
5
+ "train_samples": 10794765,
6
+ "train_samples_per_second": 27.429,
7
+ "train_steps_per_second": 0.107
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2618 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 3.5582468509674072,
3
+ "best_model_checkpoint": "checkpoints-mistral-300M-FA2/checkpoint-40000",
4
+ "epoch": 0.9999985178004752,
5
+ "eval_steps": 5000,
6
+ "global_step": 42167,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 5.9999999999999995e-05,
14
+ "loss": 9.0925,
15
+ "step": 100
16
+ },
17
+ {
18
+ "epoch": 0.0,
19
+ "learning_rate": 0.00011999999999999999,
20
+ "loss": 7.7547,
21
+ "step": 200
22
+ },
23
+ {
24
+ "epoch": 0.01,
25
+ "learning_rate": 0.00017999999999999998,
26
+ "loss": 7.3919,
27
+ "step": 300
28
+ },
29
+ {
30
+ "epoch": 0.01,
31
+ "learning_rate": 0.00023999999999999998,
32
+ "loss": 7.0885,
33
+ "step": 400
34
+ },
35
+ {
36
+ "epoch": 0.01,
37
+ "learning_rate": 0.0003,
38
+ "loss": 6.794,
39
+ "step": 500
40
+ },
41
+ {
42
+ "epoch": 0.01,
43
+ "learning_rate": 0.00035999999999999997,
44
+ "loss": 6.5749,
45
+ "step": 600
46
+ },
47
+ {
48
+ "epoch": 0.02,
49
+ "learning_rate": 0.00041999999999999996,
50
+ "loss": 6.4027,
51
+ "step": 700
52
+ },
53
+ {
54
+ "epoch": 0.02,
55
+ "learning_rate": 0.00047999999999999996,
56
+ "loss": 6.2476,
57
+ "step": 800
58
+ },
59
+ {
60
+ "epoch": 0.02,
61
+ "learning_rate": 0.00054,
62
+ "loss": 6.0979,
63
+ "step": 900
64
+ },
65
+ {
66
+ "epoch": 0.02,
67
+ "learning_rate": 0.0006,
68
+ "loss": 5.9485,
69
+ "step": 1000
70
+ },
71
+ {
72
+ "epoch": 0.03,
73
+ "learning_rate": 0.0005999912644458949,
74
+ "loss": 5.8031,
75
+ "step": 1100
76
+ },
77
+ {
78
+ "epoch": 0.03,
79
+ "learning_rate": 0.0005999650582923124,
80
+ "loss": 5.6781,
81
+ "step": 1200
82
+ },
83
+ {
84
+ "epoch": 0.03,
85
+ "learning_rate": 0.0005999213830654211,
86
+ "loss": 5.5612,
87
+ "step": 1300
88
+ },
89
+ {
90
+ "epoch": 0.03,
91
+ "learning_rate": 0.0005998602413087361,
92
+ "loss": 5.4602,
93
+ "step": 1400
94
+ },
95
+ {
96
+ "epoch": 0.04,
97
+ "learning_rate": 0.000599781636582972,
98
+ "loss": 5.3715,
99
+ "step": 1500
100
+ },
101
+ {
102
+ "epoch": 0.04,
103
+ "learning_rate": 0.0005996855734658339,
104
+ "loss": 5.2891,
105
+ "step": 1600
106
+ },
107
+ {
108
+ "epoch": 0.04,
109
+ "learning_rate": 0.0005995720575517524,
110
+ "loss": 5.2142,
111
+ "step": 1700
112
+ },
113
+ {
114
+ "epoch": 0.04,
115
+ "learning_rate": 0.0005994410954515569,
116
+ "loss": 5.1388,
117
+ "step": 1800
118
+ },
119
+ {
120
+ "epoch": 0.05,
121
+ "learning_rate": 0.0005992926947920907,
122
+ "loss": 5.0648,
123
+ "step": 1900
124
+ },
125
+ {
126
+ "epoch": 0.05,
127
+ "learning_rate": 0.0005991268642157673,
128
+ "loss": 4.9956,
129
+ "step": 2000
130
+ },
131
+ {
132
+ "epoch": 0.05,
133
+ "learning_rate": 0.0005989436133800661,
134
+ "loss": 4.937,
135
+ "step": 2100
136
+ },
137
+ {
138
+ "epoch": 0.05,
139
+ "learning_rate": 0.0005987429529569716,
140
+ "loss": 4.8876,
141
+ "step": 2200
142
+ },
143
+ {
144
+ "epoch": 0.05,
145
+ "learning_rate": 0.0005985248946323499,
146
+ "loss": 4.8387,
147
+ "step": 2300
148
+ },
149
+ {
150
+ "epoch": 0.06,
151
+ "learning_rate": 0.0005982894511052698,
152
+ "loss": 4.7943,
153
+ "step": 2400
154
+ },
155
+ {
156
+ "epoch": 0.06,
157
+ "learning_rate": 0.0005980366360872623,
158
+ "loss": 4.7574,
159
+ "step": 2500
160
+ },
161
+ {
162
+ "epoch": 0.06,
163
+ "learning_rate": 0.0005977664643015227,
164
+ "loss": 4.7216,
165
+ "step": 2600
166
+ },
167
+ {
168
+ "epoch": 0.06,
169
+ "learning_rate": 0.0005974789514820526,
170
+ "loss": 4.6875,
171
+ "step": 2700
172
+ },
173
+ {
174
+ "epoch": 0.07,
175
+ "learning_rate": 0.0005971741143727439,
176
+ "loss": 4.6595,
177
+ "step": 2800
178
+ },
179
+ {
180
+ "epoch": 0.07,
181
+ "learning_rate": 0.0005968519707264038,
182
+ "loss": 4.6346,
183
+ "step": 2900
184
+ },
185
+ {
186
+ "epoch": 0.07,
187
+ "learning_rate": 0.0005965125393037204,
188
+ "loss": 4.6029,
189
+ "step": 3000
190
+ },
191
+ {
192
+ "epoch": 0.07,
193
+ "learning_rate": 0.0005961558398721711,
194
+ "loss": 4.5849,
195
+ "step": 3100
196
+ },
197
+ {
198
+ "epoch": 0.08,
199
+ "learning_rate": 0.0005957818932048701,
200
+ "loss": 4.5592,
201
+ "step": 3200
202
+ },
203
+ {
204
+ "epoch": 0.08,
205
+ "learning_rate": 0.00059539072107936,
206
+ "loss": 4.537,
207
+ "step": 3300
208
+ },
209
+ {
210
+ "epoch": 0.08,
211
+ "learning_rate": 0.0005949823462763423,
212
+ "loss": 4.5125,
213
+ "step": 3400
214
+ },
215
+ {
216
+ "epoch": 0.08,
217
+ "learning_rate": 0.0005945567925783518,
218
+ "loss": 4.4937,
219
+ "step": 3500
220
+ },
221
+ {
222
+ "epoch": 0.09,
223
+ "learning_rate": 0.0005941140847683708,
224
+ "loss": 4.478,
225
+ "step": 3600
226
+ },
227
+ {
228
+ "epoch": 0.09,
229
+ "learning_rate": 0.0005936542486283861,
230
+ "loss": 4.4609,
231
+ "step": 3700
232
+ },
233
+ {
234
+ "epoch": 0.09,
235
+ "learning_rate": 0.0005931773109378876,
236
+ "loss": 4.4427,
237
+ "step": 3800
238
+ },
239
+ {
240
+ "epoch": 0.09,
241
+ "learning_rate": 0.0005926832994723086,
242
+ "loss": 4.429,
243
+ "step": 3900
244
+ },
245
+ {
246
+ "epoch": 0.09,
247
+ "learning_rate": 0.0005921722430014085,
248
+ "loss": 4.4091,
249
+ "step": 4000
250
+ },
251
+ {
252
+ "epoch": 0.1,
253
+ "learning_rate": 0.0005916441712875966,
254
+ "loss": 4.3971,
255
+ "step": 4100
256
+ },
257
+ {
258
+ "epoch": 0.1,
259
+ "learning_rate": 0.0005910991150842002,
260
+ "loss": 4.3842,
261
+ "step": 4200
262
+ },
263
+ {
264
+ "epoch": 0.1,
265
+ "learning_rate": 0.000590537106133672,
266
+ "loss": 4.3676,
267
+ "step": 4300
268
+ },
269
+ {
270
+ "epoch": 0.1,
271
+ "learning_rate": 0.0005899581771657428,
272
+ "loss": 4.3585,
273
+ "step": 4400
274
+ },
275
+ {
276
+ "epoch": 0.11,
277
+ "learning_rate": 0.0005893623618955148,
278
+ "loss": 4.3407,
279
+ "step": 4500
280
+ },
281
+ {
282
+ "epoch": 0.11,
283
+ "learning_rate": 0.0005887496950214981,
284
+ "loss": 4.3323,
285
+ "step": 4600
286
+ },
287
+ {
288
+ "epoch": 0.11,
289
+ "learning_rate": 0.0005881202122235901,
290
+ "loss": 4.3157,
291
+ "step": 4700
292
+ },
293
+ {
294
+ "epoch": 0.11,
295
+ "learning_rate": 0.000587473950160998,
296
+ "loss": 4.3058,
297
+ "step": 4800
298
+ },
299
+ {
300
+ "epoch": 0.12,
301
+ "learning_rate": 0.0005868109464701029,
302
+ "loss": 4.2971,
303
+ "step": 4900
304
+ },
305
+ {
306
+ "epoch": 0.12,
307
+ "learning_rate": 0.0005861312397622692,
308
+ "loss": 4.2911,
309
+ "step": 5000
310
+ },
311
+ {
312
+ "epoch": 0.12,
313
+ "eval_loss": 4.291384220123291,
314
+ "eval_runtime": 6254.7697,
315
+ "eval_samples_per_second": 88.102,
316
+ "eval_steps_per_second": 22.026,
317
+ "step": 5000
318
+ },
319
+ {
320
+ "epoch": 0.12,
321
+ "learning_rate": 0.0005854348696215949,
322
+ "loss": 4.28,
323
+ "step": 5100
324
+ },
325
+ {
326
+ "epoch": 0.12,
327
+ "learning_rate": 0.000584721876602607,
328
+ "loss": 4.2687,
329
+ "step": 5200
330
+ },
331
+ {
332
+ "epoch": 0.13,
333
+ "learning_rate": 0.0005839923022278993,
334
+ "loss": 4.255,
335
+ "step": 5300
336
+ },
337
+ {
338
+ "epoch": 0.13,
339
+ "learning_rate": 0.0005832461889857147,
340
+ "loss": 4.2493,
341
+ "step": 5400
342
+ },
343
+ {
344
+ "epoch": 0.13,
345
+ "learning_rate": 0.0005824835803274706,
346
+ "loss": 4.2397,
347
+ "step": 5500
348
+ },
349
+ {
350
+ "epoch": 0.13,
351
+ "learning_rate": 0.0005817045206652282,
352
+ "loss": 4.2307,
353
+ "step": 5600
354
+ },
355
+ {
356
+ "epoch": 0.14,
357
+ "learning_rate": 0.0005809090553691065,
358
+ "loss": 4.2223,
359
+ "step": 5700
360
+ },
361
+ {
362
+ "epoch": 0.14,
363
+ "learning_rate": 0.0005800972307646396,
364
+ "loss": 4.2181,
365
+ "step": 5800
366
+ },
367
+ {
368
+ "epoch": 0.14,
369
+ "learning_rate": 0.0005792690941300793,
370
+ "loss": 4.206,
371
+ "step": 5900
372
+ },
373
+ {
374
+ "epoch": 0.14,
375
+ "learning_rate": 0.0005784246936936413,
376
+ "loss": 4.1952,
377
+ "step": 6000
378
+ },
379
+ {
380
+ "epoch": 0.14,
381
+ "learning_rate": 0.000577564078630697,
382
+ "loss": 4.1927,
383
+ "step": 6100
384
+ },
385
+ {
386
+ "epoch": 0.15,
387
+ "learning_rate": 0.0005766872990609095,
388
+ "loss": 4.178,
389
+ "step": 6200
390
+ },
391
+ {
392
+ "epoch": 0.15,
393
+ "learning_rate": 0.0005757944060453144,
394
+ "loss": 4.1725,
395
+ "step": 6300
396
+ },
397
+ {
398
+ "epoch": 0.15,
399
+ "learning_rate": 0.0005748854515833468,
400
+ "loss": 4.1704,
401
+ "step": 6400
402
+ },
403
+ {
404
+ "epoch": 0.15,
405
+ "learning_rate": 0.0005739604886098125,
406
+ "loss": 4.1589,
407
+ "step": 6500
408
+ },
409
+ {
410
+ "epoch": 0.16,
411
+ "learning_rate": 0.0005730195709918055,
412
+ "loss": 4.1535,
413
+ "step": 6600
414
+ },
415
+ {
416
+ "epoch": 0.16,
417
+ "learning_rate": 0.0005720627535255711,
418
+ "loss": 4.1452,
419
+ "step": 6700
420
+ },
421
+ {
422
+ "epoch": 0.16,
423
+ "learning_rate": 0.000571090091933314,
424
+ "loss": 4.1424,
425
+ "step": 6800
426
+ },
427
+ {
428
+ "epoch": 0.16,
429
+ "learning_rate": 0.0005701016428599541,
430
+ "loss": 4.1345,
431
+ "step": 6900
432
+ },
433
+ {
434
+ "epoch": 0.17,
435
+ "learning_rate": 0.0005690974638698271,
436
+ "loss": 4.1261,
437
+ "step": 7000
438
+ },
439
+ {
440
+ "epoch": 0.17,
441
+ "learning_rate": 0.0005680776134433322,
442
+ "loss": 4.1234,
443
+ "step": 7100
444
+ },
445
+ {
446
+ "epoch": 0.17,
447
+ "learning_rate": 0.0005670421509735268,
448
+ "loss": 4.1154,
449
+ "step": 7200
450
+ },
451
+ {
452
+ "epoch": 0.17,
453
+ "learning_rate": 0.000565991136762667,
454
+ "loss": 4.1083,
455
+ "step": 7300
456
+ },
457
+ {
458
+ "epoch": 0.18,
459
+ "learning_rate": 0.0005649246320186961,
460
+ "loss": 4.1002,
461
+ "step": 7400
462
+ },
463
+ {
464
+ "epoch": 0.18,
465
+ "learning_rate": 0.0005638426988516804,
466
+ "loss": 4.0975,
467
+ "step": 7500
468
+ },
469
+ {
470
+ "epoch": 0.18,
471
+ "learning_rate": 0.0005627454002701908,
472
+ "loss": 4.0906,
473
+ "step": 7600
474
+ },
475
+ {
476
+ "epoch": 0.18,
477
+ "learning_rate": 0.0005616328001776353,
478
+ "loss": 4.0872,
479
+ "step": 7700
480
+ },
481
+ {
482
+ "epoch": 0.18,
483
+ "learning_rate": 0.0005605049633685356,
484
+ "loss": 4.0814,
485
+ "step": 7800
486
+ },
487
+ {
488
+ "epoch": 0.19,
489
+ "learning_rate": 0.0005593619555247551,
490
+ "loss": 4.0714,
491
+ "step": 7900
492
+ },
493
+ {
494
+ "epoch": 0.19,
495
+ "learning_rate": 0.0005582038432116726,
496
+ "loss": 4.0643,
497
+ "step": 8000
498
+ },
499
+ {
500
+ "epoch": 0.19,
501
+ "learning_rate": 0.0005570306938743069,
502
+ "loss": 4.0624,
503
+ "step": 8100
504
+ },
505
+ {
506
+ "epoch": 0.19,
507
+ "learning_rate": 0.0005558425758333878,
508
+ "loss": 4.054,
509
+ "step": 8200
510
+ },
511
+ {
512
+ "epoch": 0.2,
513
+ "learning_rate": 0.0005546395582813782,
514
+ "loss": 4.052,
515
+ "step": 8300
516
+ },
517
+ {
518
+ "epoch": 0.2,
519
+ "learning_rate": 0.0005534217112784443,
520
+ "loss": 4.046,
521
+ "step": 8400
522
+ },
523
+ {
524
+ "epoch": 0.2,
525
+ "learning_rate": 0.0005521891057483752,
526
+ "loss": 4.0427,
527
+ "step": 8500
528
+ },
529
+ {
530
+ "epoch": 0.2,
531
+ "learning_rate": 0.000550941813474453,
532
+ "loss": 4.0371,
533
+ "step": 8600
534
+ },
535
+ {
536
+ "epoch": 0.21,
537
+ "learning_rate": 0.000549679907095272,
538
+ "loss": 4.0304,
539
+ "step": 8700
540
+ },
541
+ {
542
+ "epoch": 0.21,
543
+ "learning_rate": 0.0005484034601005085,
544
+ "loss": 4.0262,
545
+ "step": 8800
546
+ },
547
+ {
548
+ "epoch": 0.21,
549
+ "learning_rate": 0.0005471125468266411,
550
+ "loss": 4.023,
551
+ "step": 8900
552
+ },
553
+ {
554
+ "epoch": 0.21,
555
+ "learning_rate": 0.0005458072424526214,
556
+ "loss": 4.0215,
557
+ "step": 9000
558
+ },
559
+ {
560
+ "epoch": 0.22,
561
+ "learning_rate": 0.000544487622995496,
562
+ "loss": 4.015,
563
+ "step": 9100
564
+ },
565
+ {
566
+ "epoch": 0.22,
567
+ "learning_rate": 0.0005431537653059793,
568
+ "loss": 4.0085,
569
+ "step": 9200
570
+ },
571
+ {
572
+ "epoch": 0.22,
573
+ "learning_rate": 0.000541805747063978,
574
+ "loss": 4.0006,
575
+ "step": 9300
576
+ },
577
+ {
578
+ "epoch": 0.22,
579
+ "learning_rate": 0.0005404436467740676,
580
+ "loss": 3.9976,
581
+ "step": 9400
582
+ },
583
+ {
584
+ "epoch": 0.23,
585
+ "learning_rate": 0.0005390675437609197,
586
+ "loss": 3.9953,
587
+ "step": 9500
588
+ },
589
+ {
590
+ "epoch": 0.23,
591
+ "learning_rate": 0.0005376775181646833,
592
+ "loss": 3.9894,
593
+ "step": 9600
594
+ },
595
+ {
596
+ "epoch": 0.23,
597
+ "learning_rate": 0.0005362736509363169,
598
+ "loss": 3.9862,
599
+ "step": 9700
600
+ },
601
+ {
602
+ "epoch": 0.23,
603
+ "learning_rate": 0.0005348560238328749,
604
+ "loss": 3.9821,
605
+ "step": 9800
606
+ },
607
+ {
608
+ "epoch": 0.23,
609
+ "learning_rate": 0.0005334247194127456,
610
+ "loss": 3.9795,
611
+ "step": 9900
612
+ },
613
+ {
614
+ "epoch": 0.24,
615
+ "learning_rate": 0.0005319798210308438,
616
+ "loss": 3.9709,
617
+ "step": 10000
618
+ },
619
+ {
620
+ "epoch": 0.24,
621
+ "eval_loss": 3.989983320236206,
622
+ "eval_runtime": 6257.6022,
623
+ "eval_samples_per_second": 88.062,
624
+ "eval_steps_per_second": 22.016,
625
+ "step": 10000
626
+ },
627
+ {
628
+ "epoch": 0.24,
629
+ "learning_rate": 0.000530521412833756,
630
+ "loss": 3.971,
631
+ "step": 10100
632
+ },
633
+ {
634
+ "epoch": 0.24,
635
+ "learning_rate": 0.0005290495797548403,
636
+ "loss": 3.9659,
637
+ "step": 10200
638
+ },
639
+ {
640
+ "epoch": 0.24,
641
+ "learning_rate": 0.00052756440750928,
642
+ "loss": 3.9599,
643
+ "step": 10300
644
+ },
645
+ {
646
+ "epoch": 0.25,
647
+ "learning_rate": 0.0005260659825890919,
648
+ "loss": 3.958,
649
+ "step": 10400
650
+ },
651
+ {
652
+ "epoch": 0.25,
653
+ "learning_rate": 0.0005245543922580891,
654
+ "loss": 3.9549,
655
+ "step": 10500
656
+ },
657
+ {
658
+ "epoch": 0.25,
659
+ "learning_rate": 0.0005230297245467988,
660
+ "loss": 3.9524,
661
+ "step": 10600
662
+ },
663
+ {
664
+ "epoch": 0.25,
665
+ "learning_rate": 0.0005214920682473364,
666
+ "loss": 3.9487,
667
+ "step": 10700
668
+ },
669
+ {
670
+ "epoch": 0.26,
671
+ "learning_rate": 0.000519941512908234,
672
+ "loss": 3.9405,
673
+ "step": 10800
674
+ },
675
+ {
676
+ "epoch": 0.26,
677
+ "learning_rate": 0.0005183781488292252,
678
+ "loss": 3.9388,
679
+ "step": 10900
680
+ },
681
+ {
682
+ "epoch": 0.26,
683
+ "learning_rate": 0.0005168020670559866,
684
+ "loss": 3.9395,
685
+ "step": 11000
686
+ },
687
+ {
688
+ "epoch": 0.26,
689
+ "learning_rate": 0.0005152133593748358,
690
+ "loss": 3.9324,
691
+ "step": 11100
692
+ },
693
+ {
694
+ "epoch": 0.27,
695
+ "learning_rate": 0.0005136121183073853,
696
+ "loss": 3.9289,
697
+ "step": 11200
698
+ },
699
+ {
700
+ "epoch": 0.27,
701
+ "learning_rate": 0.0005119984371051549,
702
+ "loss": 3.9234,
703
+ "step": 11300
704
+ },
705
+ {
706
+ "epoch": 0.27,
707
+ "learning_rate": 0.0005103724097441411,
708
+ "loss": 3.9227,
709
+ "step": 11400
710
+ },
711
+ {
712
+ "epoch": 0.27,
713
+ "learning_rate": 0.0005087341309193438,
714
+ "loss": 3.9204,
715
+ "step": 11500
716
+ },
717
+ {
718
+ "epoch": 0.28,
719
+ "learning_rate": 0.0005070836960392517,
720
+ "loss": 3.918,
721
+ "step": 11600
722
+ },
723
+ {
724
+ "epoch": 0.28,
725
+ "learning_rate": 0.0005054212012202861,
726
+ "loss": 3.9053,
727
+ "step": 11700
728
+ },
729
+ {
730
+ "epoch": 0.28,
731
+ "learning_rate": 0.0005037467432812033,
732
+ "loss": 3.9075,
733
+ "step": 11800
734
+ },
735
+ {
736
+ "epoch": 0.28,
737
+ "learning_rate": 0.0005020604197374561,
738
+ "loss": 3.9064,
739
+ "step": 11900
740
+ },
741
+ {
742
+ "epoch": 0.28,
743
+ "learning_rate": 0.0005003623287955149,
744
+ "loss": 3.9026,
745
+ "step": 12000
746
+ },
747
+ {
748
+ "epoch": 0.29,
749
+ "learning_rate": 0.0004986697243743568,
750
+ "loss": 3.8982,
751
+ "step": 12100
752
+ },
753
+ {
754
+ "epoch": 0.29,
755
+ "learning_rate": 0.0004969485111851287,
756
+ "loss": 3.8938,
757
+ "step": 12200
758
+ },
759
+ {
760
+ "epoch": 0.29,
761
+ "learning_rate": 0.0004952158283000648,
762
+ "loss": 3.8916,
763
+ "step": 12300
764
+ },
765
+ {
766
+ "epoch": 0.29,
767
+ "learning_rate": 0.0004934717766254659,
768
+ "loss": 3.8897,
769
+ "step": 12400
770
+ },
771
+ {
772
+ "epoch": 0.3,
773
+ "learning_rate": 0.0004917164577297167,
774
+ "loss": 3.8904,
775
+ "step": 12500
776
+ },
777
+ {
778
+ "epoch": 0.3,
779
+ "learning_rate": 0.000489949973837372,
780
+ "loss": 3.8837,
781
+ "step": 12600
782
+ },
783
+ {
784
+ "epoch": 0.3,
785
+ "learning_rate": 0.0004881724278232027,
786
+ "loss": 3.8825,
787
+ "step": 12700
788
+ },
789
+ {
790
+ "epoch": 0.3,
791
+ "learning_rate": 0.0004863839232062045,
792
+ "loss": 3.877,
793
+ "step": 12800
794
+ },
795
+ {
796
+ "epoch": 0.31,
797
+ "learning_rate": 0.0004845845641435698,
798
+ "loss": 3.8772,
799
+ "step": 12900
800
+ },
801
+ {
802
+ "epoch": 0.31,
803
+ "learning_rate": 0.0004827744554246214,
804
+ "loss": 3.8727,
805
+ "step": 13000
806
+ },
807
+ {
808
+ "epoch": 0.31,
809
+ "learning_rate": 0.0004809537024647106,
810
+ "loss": 3.8677,
811
+ "step": 13100
812
+ },
813
+ {
814
+ "epoch": 0.31,
815
+ "learning_rate": 0.00047912241129907716,
816
+ "loss": 3.8691,
817
+ "step": 13200
818
+ },
819
+ {
820
+ "epoch": 0.32,
821
+ "learning_rate": 0.00047728068857667475,
822
+ "loss": 3.8654,
823
+ "step": 13300
824
+ },
825
+ {
826
+ "epoch": 0.32,
827
+ "learning_rate": 0.00047542864155396025,
828
+ "loss": 3.8623,
829
+ "step": 13400
830
+ },
831
+ {
832
+ "epoch": 0.32,
833
+ "learning_rate": 0.00047356637808864646,
834
+ "loss": 3.8523,
835
+ "step": 13500
836
+ },
837
+ {
838
+ "epoch": 0.32,
839
+ "learning_rate": 0.000471694006633422,
840
+ "loss": 3.8573,
841
+ "step": 13600
842
+ },
843
+ {
844
+ "epoch": 0.32,
845
+ "learning_rate": 0.00046981163622963445,
846
+ "loss": 3.8565,
847
+ "step": 13700
848
+ },
849
+ {
850
+ "epoch": 0.33,
851
+ "learning_rate": 0.0004679193765009406,
852
+ "loss": 3.8482,
853
+ "step": 13800
854
+ },
855
+ {
856
+ "epoch": 0.33,
857
+ "learning_rate": 0.00046601733764692197,
858
+ "loss": 3.8434,
859
+ "step": 13900
860
+ },
861
+ {
862
+ "epoch": 0.33,
863
+ "learning_rate": 0.0004641056304366674,
864
+ "loss": 3.8503,
865
+ "step": 14000
866
+ },
867
+ {
868
+ "epoch": 0.33,
869
+ "learning_rate": 0.000462184366202322,
870
+ "loss": 3.8419,
871
+ "step": 14100
872
+ },
873
+ {
874
+ "epoch": 0.34,
875
+ "learning_rate": 0.00046027301031098105,
876
+ "loss": 3.8443,
877
+ "step": 14200
878
+ },
879
+ {
880
+ "epoch": 0.34,
881
+ "learning_rate": 0.00045833306101326796,
882
+ "loss": 3.8355,
883
+ "step": 14300
884
+ },
885
+ {
886
+ "epoch": 0.34,
887
+ "learning_rate": 0.0004563838908687476,
888
+ "loss": 3.8367,
889
+ "step": 14400
890
+ },
891
+ {
892
+ "epoch": 0.34,
893
+ "learning_rate": 0.000454425613391295,
894
+ "loss": 3.8354,
895
+ "step": 14500
896
+ },
897
+ {
898
+ "epoch": 0.35,
899
+ "learning_rate": 0.0004524583426251691,
900
+ "loss": 3.8335,
901
+ "step": 14600
902
+ },
903
+ {
904
+ "epoch": 0.35,
905
+ "learning_rate": 0.0004504821931383715,
906
+ "loss": 3.8349,
907
+ "step": 14700
908
+ },
909
+ {
910
+ "epoch": 0.35,
911
+ "learning_rate": 0.00044849728001597385,
912
+ "loss": 3.8244,
913
+ "step": 14800
914
+ },
915
+ {
916
+ "epoch": 0.35,
917
+ "learning_rate": 0.0004465236968920431,
918
+ "loss": 3.821,
919
+ "step": 14900
920
+ },
921
+ {
922
+ "epoch": 0.36,
923
+ "learning_rate": 0.00044452168853148435,
924
+ "loss": 3.8229,
925
+ "step": 15000
926
+ },
927
+ {
928
+ "epoch": 0.36,
929
+ "eval_loss": 3.838818311691284,
930
+ "eval_runtime": 6259.3563,
931
+ "eval_samples_per_second": 88.037,
932
+ "eval_steps_per_second": 22.009,
933
+ "step": 15000
934
+ },
935
+ {
936
+ "epoch": 0.36,
937
+ "learning_rate": 0.0004425112636573954,
938
+ "loss": 3.817,
939
+ "step": 15100
940
+ },
941
+ {
942
+ "epoch": 0.36,
943
+ "learning_rate": 0.00044049253935094467,
944
+ "loss": 3.8165,
945
+ "step": 15200
946
+ },
947
+ {
948
+ "epoch": 0.36,
949
+ "learning_rate": 0.0004384656331766349,
950
+ "loss": 3.8144,
951
+ "step": 15300
952
+ },
953
+ {
954
+ "epoch": 0.37,
955
+ "learning_rate": 0.00043643066317545647,
956
+ "loss": 3.8139,
957
+ "step": 15400
958
+ },
959
+ {
960
+ "epoch": 0.37,
961
+ "learning_rate": 0.000434387747858013,
962
+ "loss": 3.8071,
963
+ "step": 15500
964
+ },
965
+ {
966
+ "epoch": 0.37,
967
+ "learning_rate": 0.0004323370061976197,
968
+ "loss": 3.8034,
969
+ "step": 15600
970
+ },
971
+ {
972
+ "epoch": 0.37,
973
+ "learning_rate": 0.0004302785576233748,
974
+ "loss": 3.8071,
975
+ "step": 15700
976
+ },
977
+ {
978
+ "epoch": 0.37,
979
+ "learning_rate": 0.0004282125220132043,
980
+ "loss": 3.8009,
981
+ "step": 15800
982
+ },
983
+ {
984
+ "epoch": 0.38,
985
+ "learning_rate": 0.0004261390196868805,
986
+ "loss": 3.7961,
987
+ "step": 15900
988
+ },
989
+ {
990
+ "epoch": 0.38,
991
+ "learning_rate": 0.00042405817139901526,
992
+ "loss": 3.7929,
993
+ "step": 16000
994
+ },
995
+ {
996
+ "epoch": 0.38,
997
+ "learning_rate": 0.00042197009833202696,
998
+ "loss": 3.8016,
999
+ "step": 16100
1000
+ },
1001
+ {
1002
+ "epoch": 0.38,
1003
+ "learning_rate": 0.00041987492208908427,
1004
+ "loss": 3.7909,
1005
+ "step": 16200
1006
+ },
1007
+ {
1008
+ "epoch": 0.39,
1009
+ "learning_rate": 0.0004177727646870232,
1010
+ "loss": 3.7895,
1011
+ "step": 16300
1012
+ },
1013
+ {
1014
+ "epoch": 0.39,
1015
+ "learning_rate": 0.00041566374854924194,
1016
+ "loss": 3.7867,
1017
+ "step": 16400
1018
+ },
1019
+ {
1020
+ "epoch": 0.39,
1021
+ "learning_rate": 0.00041354799649857116,
1022
+ "loss": 3.7862,
1023
+ "step": 16500
1024
+ },
1025
+ {
1026
+ "epoch": 0.39,
1027
+ "learning_rate": 0.00041142563175012073,
1028
+ "loss": 3.7839,
1029
+ "step": 16600
1030
+ },
1031
+ {
1032
+ "epoch": 0.4,
1033
+ "learning_rate": 0.0004092967779041047,
1034
+ "loss": 3.7807,
1035
+ "step": 16700
1036
+ },
1037
+ {
1038
+ "epoch": 0.4,
1039
+ "learning_rate": 0.0004071615589386428,
1040
+ "loss": 3.7772,
1041
+ "step": 16800
1042
+ },
1043
+ {
1044
+ "epoch": 0.4,
1045
+ "learning_rate": 0.00040502009920254025,
1046
+ "loss": 3.7765,
1047
+ "step": 16900
1048
+ },
1049
+ {
1050
+ "epoch": 0.4,
1051
+ "learning_rate": 0.00040287252340804637,
1052
+ "loss": 3.7742,
1053
+ "step": 17000
1054
+ },
1055
+ {
1056
+ "epoch": 0.41,
1057
+ "learning_rate": 0.0004007189566235915,
1058
+ "loss": 3.7766,
1059
+ "step": 17100
1060
+ },
1061
+ {
1062
+ "epoch": 0.41,
1063
+ "learning_rate": 0.0003985595242665033,
1064
+ "loss": 3.7685,
1065
+ "step": 17200
1066
+ },
1067
+ {
1068
+ "epoch": 0.41,
1069
+ "learning_rate": 0.00039639435209570307,
1070
+ "loss": 3.7715,
1071
+ "step": 17300
1072
+ },
1073
+ {
1074
+ "epoch": 0.41,
1075
+ "learning_rate": 0.0003942235662043819,
1076
+ "loss": 3.7718,
1077
+ "step": 17400
1078
+ },
1079
+ {
1080
+ "epoch": 0.42,
1081
+ "learning_rate": 0.000392047293012657,
1082
+ "loss": 3.7688,
1083
+ "step": 17500
1084
+ },
1085
+ {
1086
+ "epoch": 0.42,
1087
+ "learning_rate": 0.00038986565926021,
1088
+ "loss": 3.7631,
1089
+ "step": 17600
1090
+ },
1091
+ {
1092
+ "epoch": 0.42,
1093
+ "learning_rate": 0.0003876787919989051,
1094
+ "loss": 3.7589,
1095
+ "step": 17700
1096
+ },
1097
+ {
1098
+ "epoch": 0.42,
1099
+ "learning_rate": 0.0003854868185853913,
1100
+ "loss": 3.7614,
1101
+ "step": 17800
1102
+ },
1103
+ {
1104
+ "epoch": 0.42,
1105
+ "learning_rate": 0.0003832898666736839,
1106
+ "loss": 3.7549,
1107
+ "step": 17900
1108
+ },
1109
+ {
1110
+ "epoch": 0.43,
1111
+ "learning_rate": 0.0003810880642077316,
1112
+ "loss": 3.7571,
1113
+ "step": 18000
1114
+ },
1115
+ {
1116
+ "epoch": 0.43,
1117
+ "learning_rate": 0.00037888153941396496,
1118
+ "loss": 3.7534,
1119
+ "step": 18100
1120
+ },
1121
+ {
1122
+ "epoch": 0.43,
1123
+ "learning_rate": 0.0003766704207938287,
1124
+ "loss": 3.7517,
1125
+ "step": 18200
1126
+ },
1127
+ {
1128
+ "epoch": 0.43,
1129
+ "learning_rate": 0.0003744548371162984,
1130
+ "loss": 3.7567,
1131
+ "step": 18300
1132
+ },
1133
+ {
1134
+ "epoch": 0.44,
1135
+ "learning_rate": 0.0003722349174103814,
1136
+ "loss": 3.7486,
1137
+ "step": 18400
1138
+ },
1139
+ {
1140
+ "epoch": 0.44,
1141
+ "learning_rate": 0.00037001079095760225,
1142
+ "loss": 3.7516,
1143
+ "step": 18500
1144
+ },
1145
+ {
1146
+ "epoch": 0.44,
1147
+ "learning_rate": 0.0003677825872844742,
1148
+ "loss": 3.7437,
1149
+ "step": 18600
1150
+ },
1151
+ {
1152
+ "epoch": 0.44,
1153
+ "learning_rate": 0.0003655504361549554,
1154
+ "loss": 3.7457,
1155
+ "step": 18700
1156
+ },
1157
+ {
1158
+ "epoch": 0.45,
1159
+ "learning_rate": 0.00036331446756289226,
1160
+ "loss": 3.7464,
1161
+ "step": 18800
1162
+ },
1163
+ {
1164
+ "epoch": 0.45,
1165
+ "learning_rate": 0.00036109722610660756,
1166
+ "loss": 3.741,
1167
+ "step": 18900
1168
+ },
1169
+ {
1170
+ "epoch": 0.45,
1171
+ "learning_rate": 0.0003588540483745179,
1172
+ "loss": 3.7379,
1173
+ "step": 19000
1174
+ },
1175
+ {
1176
+ "epoch": 0.45,
1177
+ "learning_rate": 0.0003566074431576024,
1178
+ "loss": 3.738,
1179
+ "step": 19100
1180
+ },
1181
+ {
1182
+ "epoch": 0.46,
1183
+ "learning_rate": 0.00035435754129147054,
1184
+ "loss": 3.7309,
1185
+ "step": 19200
1186
+ },
1187
+ {
1188
+ "epoch": 0.46,
1189
+ "learning_rate": 0.00035210447380371886,
1190
+ "loss": 3.7355,
1191
+ "step": 19300
1192
+ },
1193
+ {
1194
+ "epoch": 0.46,
1195
+ "learning_rate": 0.0003498483719063004,
1196
+ "loss": 3.7344,
1197
+ "step": 19400
1198
+ },
1199
+ {
1200
+ "epoch": 0.46,
1201
+ "learning_rate": 0.000347589366987883,
1202
+ "loss": 3.735,
1203
+ "step": 19500
1204
+ },
1205
+ {
1206
+ "epoch": 0.46,
1207
+ "learning_rate": 0.000345327590606198,
1208
+ "loss": 3.7291,
1209
+ "step": 19600
1210
+ },
1211
+ {
1212
+ "epoch": 0.47,
1213
+ "learning_rate": 0.00034306317448037834,
1214
+ "loss": 3.7295,
1215
+ "step": 19700
1216
+ },
1217
+ {
1218
+ "epoch": 0.47,
1219
+ "learning_rate": 0.00034079625048328796,
1220
+ "loss": 3.7221,
1221
+ "step": 19800
1222
+ },
1223
+ {
1224
+ "epoch": 0.47,
1225
+ "learning_rate": 0.00033852695063384174,
1226
+ "loss": 3.7301,
1227
+ "step": 19900
1228
+ },
1229
+ {
1230
+ "epoch": 0.47,
1231
+ "learning_rate": 0.00033625540708931705,
1232
+ "loss": 3.7197,
1233
+ "step": 20000
1234
+ },
1235
+ {
1236
+ "epoch": 0.47,
1237
+ "eval_loss": 3.7453513145446777,
1238
+ "eval_runtime": 6261.7484,
1239
+ "eval_samples_per_second": 88.004,
1240
+ "eval_steps_per_second": 22.001,
1241
+ "step": 20000
1242
+ },
1243
+ {
1244
+ "epoch": 0.48,
1245
+ "learning_rate": 0.0003339817521376575,
1246
+ "loss": 3.7178,
1247
+ "step": 20100
1248
+ },
1249
+ {
1250
+ "epoch": 0.48,
1251
+ "learning_rate": 0.00033170611818976876,
1252
+ "loss": 3.7157,
1253
+ "step": 20200
1254
+ },
1255
+ {
1256
+ "epoch": 0.48,
1257
+ "learning_rate": 0.0003294286377718072,
1258
+ "loss": 3.7184,
1259
+ "step": 20300
1260
+ },
1261
+ {
1262
+ "epoch": 0.48,
1263
+ "learning_rate": 0.00032714944351746255,
1264
+ "loss": 3.7167,
1265
+ "step": 20400
1266
+ },
1267
+ {
1268
+ "epoch": 0.49,
1269
+ "learning_rate": 0.0003248914833042039,
1270
+ "loss": 3.7177,
1271
+ "step": 20500
1272
+ },
1273
+ {
1274
+ "epoch": 0.49,
1275
+ "learning_rate": 0.00032260927349466893,
1276
+ "loss": 3.712,
1277
+ "step": 20600
1278
+ },
1279
+ {
1280
+ "epoch": 0.49,
1281
+ "learning_rate": 0.0003203257469882546,
1282
+ "loss": 3.7095,
1283
+ "step": 20700
1284
+ },
1285
+ {
1286
+ "epoch": 0.49,
1287
+ "learning_rate": 0.0003180410367707568,
1288
+ "loss": 3.7036,
1289
+ "step": 20800
1290
+ },
1291
+ {
1292
+ "epoch": 0.5,
1293
+ "learning_rate": 0.0003157552758969068,
1294
+ "loss": 3.7059,
1295
+ "step": 20900
1296
+ },
1297
+ {
1298
+ "epoch": 0.5,
1299
+ "learning_rate": 0.0003134685974826232,
1300
+ "loss": 3.7097,
1301
+ "step": 21000
1302
+ },
1303
+ {
1304
+ "epoch": 0.5,
1305
+ "learning_rate": 0.00031118113469725937,
1306
+ "loss": 3.7021,
1307
+ "step": 21100
1308
+ },
1309
+ {
1310
+ "epoch": 0.5,
1311
+ "learning_rate": 0.00030889302075584824,
1312
+ "loss": 3.7026,
1313
+ "step": 21200
1314
+ },
1315
+ {
1316
+ "epoch": 0.51,
1317
+ "learning_rate": 0.0003066043889113439,
1318
+ "loss": 3.7003,
1319
+ "step": 21300
1320
+ },
1321
+ {
1322
+ "epoch": 0.51,
1323
+ "learning_rate": 0.00030431537244686186,
1324
+ "loss": 3.7008,
1325
+ "step": 21400
1326
+ },
1327
+ {
1328
+ "epoch": 0.51,
1329
+ "learning_rate": 0.00030202610466791653,
1330
+ "loss": 3.6968,
1331
+ "step": 21500
1332
+ },
1333
+ {
1334
+ "epoch": 0.51,
1335
+ "learning_rate": 0.00029973671889465826,
1336
+ "loss": 3.6949,
1337
+ "step": 21600
1338
+ },
1339
+ {
1340
+ "epoch": 0.51,
1341
+ "learning_rate": 0.00029744734845410883,
1342
+ "loss": 3.6992,
1343
+ "step": 21700
1344
+ },
1345
+ {
1346
+ "epoch": 0.52,
1347
+ "learning_rate": 0.00029515812667239735,
1348
+ "loss": 3.6916,
1349
+ "step": 21800
1350
+ },
1351
+ {
1352
+ "epoch": 0.52,
1353
+ "learning_rate": 0.00029286918686699537,
1354
+ "loss": 3.6919,
1355
+ "step": 21900
1356
+ },
1357
+ {
1358
+ "epoch": 0.52,
1359
+ "learning_rate": 0.0002905806623389529,
1360
+ "loss": 3.6909,
1361
+ "step": 22000
1362
+ },
1363
+ {
1364
+ "epoch": 0.52,
1365
+ "learning_rate": 0.00028829268636513573,
1366
+ "loss": 3.6979,
1367
+ "step": 22100
1368
+ },
1369
+ {
1370
+ "epoch": 0.53,
1371
+ "learning_rate": 0.00028600539219046303,
1372
+ "loss": 3.689,
1373
+ "step": 22200
1374
+ },
1375
+ {
1376
+ "epoch": 0.53,
1377
+ "learning_rate": 0.0002837189130201484,
1378
+ "loss": 3.684,
1379
+ "step": 22300
1380
+ },
1381
+ {
1382
+ "epoch": 0.53,
1383
+ "learning_rate": 0.0002814333820119417,
1384
+ "loss": 3.6825,
1385
+ "step": 22400
1386
+ },
1387
+ {
1388
+ "epoch": 0.53,
1389
+ "learning_rate": 0.00027914893226837486,
1390
+ "loss": 3.6896,
1391
+ "step": 22500
1392
+ },
1393
+ {
1394
+ "epoch": 0.54,
1395
+ "learning_rate": 0.00027686569682901013,
1396
+ "loss": 3.6824,
1397
+ "step": 22600
1398
+ },
1399
+ {
1400
+ "epoch": 0.54,
1401
+ "learning_rate": 0.0002746066204389395,
1402
+ "loss": 3.6777,
1403
+ "step": 22700
1404
+ },
1405
+ {
1406
+ "epoch": 0.54,
1407
+ "learning_rate": 0.00027232619697688704,
1408
+ "loss": 3.6824,
1409
+ "step": 22800
1410
+ },
1411
+ {
1412
+ "epoch": 0.54,
1413
+ "learning_rate": 0.0002700473851548586,
1414
+ "loss": 3.6806,
1415
+ "step": 22900
1416
+ },
1417
+ {
1418
+ "epoch": 0.55,
1419
+ "learning_rate": 0.0002677703176840807,
1420
+ "loss": 3.6795,
1421
+ "step": 23000
1422
+ },
1423
+ {
1424
+ "epoch": 0.55,
1425
+ "learning_rate": 0.0002654951271741938,
1426
+ "loss": 3.6753,
1427
+ "step": 23100
1428
+ },
1429
+ {
1430
+ "epoch": 0.55,
1431
+ "learning_rate": 0.0002632219461255299,
1432
+ "loss": 3.6703,
1433
+ "step": 23200
1434
+ },
1435
+ {
1436
+ "epoch": 0.55,
1437
+ "learning_rate": 0.00026095090692139603,
1438
+ "loss": 3.6678,
1439
+ "step": 23300
1440
+ },
1441
+ {
1442
+ "epoch": 0.55,
1443
+ "learning_rate": 0.0002586821418203645,
1444
+ "loss": 3.6701,
1445
+ "step": 23400
1446
+ },
1447
+ {
1448
+ "epoch": 0.56,
1449
+ "learning_rate": 0.00025641578294857047,
1450
+ "loss": 3.6712,
1451
+ "step": 23500
1452
+ },
1453
+ {
1454
+ "epoch": 0.56,
1455
+ "learning_rate": 0.0002541519622920176,
1456
+ "loss": 3.6709,
1457
+ "step": 23600
1458
+ },
1459
+ {
1460
+ "epoch": 0.56,
1461
+ "learning_rate": 0.0002518908116888915,
1462
+ "loss": 3.6688,
1463
+ "step": 23700
1464
+ },
1465
+ {
1466
+ "epoch": 0.56,
1467
+ "learning_rate": 0.00024963246282188163,
1468
+ "loss": 3.6668,
1469
+ "step": 23800
1470
+ },
1471
+ {
1472
+ "epoch": 0.57,
1473
+ "learning_rate": 0.0002473770472105129,
1474
+ "loss": 3.6671,
1475
+ "step": 23900
1476
+ },
1477
+ {
1478
+ "epoch": 0.57,
1479
+ "learning_rate": 0.00024512469620348586,
1480
+ "loss": 3.6619,
1481
+ "step": 24000
1482
+ },
1483
+ {
1484
+ "epoch": 0.57,
1485
+ "learning_rate": 0.00024287554097102775,
1486
+ "loss": 3.66,
1487
+ "step": 24100
1488
+ },
1489
+ {
1490
+ "epoch": 0.57,
1491
+ "learning_rate": 0.00024062971249725343,
1492
+ "loss": 3.663,
1493
+ "step": 24200
1494
+ },
1495
+ {
1496
+ "epoch": 0.58,
1497
+ "learning_rate": 0.00023838734157253735,
1498
+ "loss": 3.6586,
1499
+ "step": 24300
1500
+ },
1501
+ {
1502
+ "epoch": 0.58,
1503
+ "learning_rate": 0.00023614855878589612,
1504
+ "loss": 3.6627,
1505
+ "step": 24400
1506
+ },
1507
+ {
1508
+ "epoch": 0.58,
1509
+ "learning_rate": 0.00023391349451738433,
1510
+ "loss": 3.6548,
1511
+ "step": 24500
1512
+ },
1513
+ {
1514
+ "epoch": 0.58,
1515
+ "learning_rate": 0.00023168227893050097,
1516
+ "loss": 3.6541,
1517
+ "step": 24600
1518
+ },
1519
+ {
1520
+ "epoch": 0.59,
1521
+ "learning_rate": 0.00022945504196460908,
1522
+ "loss": 3.6516,
1523
+ "step": 24700
1524
+ },
1525
+ {
1526
+ "epoch": 0.59,
1527
+ "learning_rate": 0.00022723191332736894,
1528
+ "loss": 3.6545,
1529
+ "step": 24800
1530
+ },
1531
+ {
1532
+ "epoch": 0.59,
1533
+ "learning_rate": 0.00022501302248718378,
1534
+ "loss": 3.6536,
1535
+ "step": 24900
1536
+ },
1537
+ {
1538
+ "epoch": 0.59,
1539
+ "learning_rate": 0.0002227984986656603,
1540
+ "loss": 3.652,
1541
+ "step": 25000
1542
+ },
1543
+ {
1544
+ "epoch": 0.59,
1545
+ "eval_loss": 3.6738803386688232,
1546
+ "eval_runtime": 6261.7124,
1547
+ "eval_samples_per_second": 88.004,
1548
+ "eval_steps_per_second": 22.001,
1549
+ "step": 25000
1550
+ },
1551
+ {
1552
+ "epoch": 0.6,
1553
+ "learning_rate": 0.00022061054843048285,
1554
+ "loss": 3.6444,
1555
+ "step": 25100
1556
+ },
1557
+ {
1558
+ "epoch": 0.6,
1559
+ "learning_rate": 0.000218405098403175,
1560
+ "loss": 3.6463,
1561
+ "step": 25200
1562
+ },
1563
+ {
1564
+ "epoch": 0.6,
1565
+ "learning_rate": 0.00021620440022038445,
1566
+ "loss": 3.6485,
1567
+ "step": 25300
1568
+ },
1569
+ {
1570
+ "epoch": 0.6,
1571
+ "learning_rate": 0.00021400858204423146,
1572
+ "loss": 3.6457,
1573
+ "step": 25400
1574
+ },
1575
+ {
1576
+ "epoch": 0.6,
1577
+ "learning_rate": 0.00021181777175263927,
1578
+ "loss": 3.6429,
1579
+ "step": 25500
1580
+ },
1581
+ {
1582
+ "epoch": 0.61,
1583
+ "learning_rate": 0.00020963209693188685,
1584
+ "loss": 3.6426,
1585
+ "step": 25600
1586
+ },
1587
+ {
1588
+ "epoch": 0.61,
1589
+ "learning_rate": 0.00020745168486917856,
1590
+ "loss": 3.6436,
1591
+ "step": 25700
1592
+ },
1593
+ {
1594
+ "epoch": 0.61,
1595
+ "learning_rate": 0.00020527666254523122,
1596
+ "loss": 3.638,
1597
+ "step": 25800
1598
+ },
1599
+ {
1600
+ "epoch": 0.61,
1601
+ "learning_rate": 0.0002031071566268795,
1602
+ "loss": 3.6347,
1603
+ "step": 25900
1604
+ },
1605
+ {
1606
+ "epoch": 0.62,
1607
+ "learning_rate": 0.00020094329345969906,
1608
+ "loss": 3.6352,
1609
+ "step": 26000
1610
+ },
1611
+ {
1612
+ "epoch": 0.62,
1613
+ "learning_rate": 0.00019878519906064822,
1614
+ "loss": 3.6357,
1615
+ "step": 26100
1616
+ },
1617
+ {
1618
+ "epoch": 0.62,
1619
+ "learning_rate": 0.00019663299911072975,
1620
+ "loss": 3.6363,
1621
+ "step": 26200
1622
+ },
1623
+ {
1624
+ "epoch": 0.62,
1625
+ "learning_rate": 0.00019448681894767086,
1626
+ "loss": 3.6347,
1627
+ "step": 26300
1628
+ },
1629
+ {
1630
+ "epoch": 0.63,
1631
+ "learning_rate": 0.00019234678355862448,
1632
+ "loss": 3.6289,
1633
+ "step": 26400
1634
+ },
1635
+ {
1636
+ "epoch": 0.63,
1637
+ "learning_rate": 0.0001902130175728901,
1638
+ "loss": 3.6329,
1639
+ "step": 26500
1640
+ },
1641
+ {
1642
+ "epoch": 0.63,
1643
+ "learning_rate": 0.0001880856452546559,
1644
+ "loss": 3.6347,
1645
+ "step": 26600
1646
+ },
1647
+ {
1648
+ "epoch": 0.63,
1649
+ "learning_rate": 0.00018596479049576175,
1650
+ "loss": 3.6317,
1651
+ "step": 26700
1652
+ },
1653
+ {
1654
+ "epoch": 0.64,
1655
+ "learning_rate": 0.0001838505768084843,
1656
+ "loss": 3.6218,
1657
+ "step": 26800
1658
+ },
1659
+ {
1660
+ "epoch": 0.64,
1661
+ "learning_rate": 0.00018174312731834396,
1662
+ "loss": 3.6279,
1663
+ "step": 26900
1664
+ },
1665
+ {
1666
+ "epoch": 0.64,
1667
+ "learning_rate": 0.0001796425647569343,
1668
+ "loss": 3.6248,
1669
+ "step": 27000
1670
+ },
1671
+ {
1672
+ "epoch": 0.64,
1673
+ "learning_rate": 0.00017754901145477467,
1674
+ "loss": 3.6295,
1675
+ "step": 27100
1676
+ },
1677
+ {
1678
+ "epoch": 0.65,
1679
+ "learning_rate": 0.00017548341785672704,
1680
+ "loss": 3.6232,
1681
+ "step": 27200
1682
+ },
1683
+ {
1684
+ "epoch": 0.65,
1685
+ "learning_rate": 0.00017340417529776694,
1686
+ "loss": 3.6214,
1687
+ "step": 27300
1688
+ },
1689
+ {
1690
+ "epoch": 0.65,
1691
+ "learning_rate": 0.00017133230530331462,
1692
+ "loss": 3.6229,
1693
+ "step": 27400
1694
+ },
1695
+ {
1696
+ "epoch": 0.65,
1697
+ "learning_rate": 0.00016926792853291946,
1698
+ "loss": 3.6203,
1699
+ "step": 27500
1700
+ },
1701
+ {
1702
+ "epoch": 0.65,
1703
+ "learning_rate": 0.00016721116520974823,
1704
+ "loss": 3.617,
1705
+ "step": 27600
1706
+ },
1707
+ {
1708
+ "epoch": 0.66,
1709
+ "learning_rate": 0.0001651621351135826,
1710
+ "loss": 3.6154,
1711
+ "step": 27700
1712
+ },
1713
+ {
1714
+ "epoch": 0.66,
1715
+ "learning_rate": 0.00016312095757384451,
1716
+ "loss": 3.6209,
1717
+ "step": 27800
1718
+ },
1719
+ {
1720
+ "epoch": 0.66,
1721
+ "learning_rate": 0.00016108775146264626,
1722
+ "loss": 3.6179,
1723
+ "step": 27900
1724
+ },
1725
+ {
1726
+ "epoch": 0.66,
1727
+ "learning_rate": 0.00015906263518786752,
1728
+ "loss": 3.6132,
1729
+ "step": 28000
1730
+ },
1731
+ {
1732
+ "epoch": 0.67,
1733
+ "learning_rate": 0.00015704572668626048,
1734
+ "loss": 3.6137,
1735
+ "step": 28100
1736
+ },
1737
+ {
1738
+ "epoch": 0.67,
1739
+ "learning_rate": 0.00015503714341658065,
1740
+ "loss": 3.6088,
1741
+ "step": 28200
1742
+ },
1743
+ {
1744
+ "epoch": 0.67,
1745
+ "learning_rate": 0.0001530370023527469,
1746
+ "loss": 3.6135,
1747
+ "step": 28300
1748
+ },
1749
+ {
1750
+ "epoch": 0.67,
1751
+ "learning_rate": 0.00015104541997702905,
1752
+ "loss": 3.6092,
1753
+ "step": 28400
1754
+ },
1755
+ {
1756
+ "epoch": 0.68,
1757
+ "learning_rate": 0.0001490625122732643,
1758
+ "loss": 3.6125,
1759
+ "step": 28500
1760
+ },
1761
+ {
1762
+ "epoch": 0.68,
1763
+ "learning_rate": 0.00014708839472010312,
1764
+ "loss": 3.6125,
1765
+ "step": 28600
1766
+ },
1767
+ {
1768
+ "epoch": 0.68,
1769
+ "learning_rate": 0.00014512318228428328,
1770
+ "loss": 3.6076,
1771
+ "step": 28700
1772
+ },
1773
+ {
1774
+ "epoch": 0.68,
1775
+ "learning_rate": 0.00014316698941393538,
1776
+ "loss": 3.606,
1777
+ "step": 28800
1778
+ },
1779
+ {
1780
+ "epoch": 0.69,
1781
+ "learning_rate": 0.00014121993003191695,
1782
+ "loss": 3.6039,
1783
+ "step": 28900
1784
+ },
1785
+ {
1786
+ "epoch": 0.69,
1787
+ "learning_rate": 0.00013928211752917854,
1788
+ "loss": 3.6058,
1789
+ "step": 29000
1790
+ },
1791
+ {
1792
+ "epoch": 0.69,
1793
+ "learning_rate": 0.00013735366475816006,
1794
+ "loss": 3.6023,
1795
+ "step": 29100
1796
+ },
1797
+ {
1798
+ "epoch": 0.69,
1799
+ "learning_rate": 0.00013543468402621808,
1800
+ "loss": 3.5966,
1801
+ "step": 29200
1802
+ },
1803
+ {
1804
+ "epoch": 0.69,
1805
+ "learning_rate": 0.00013352528708908623,
1806
+ "loss": 3.6002,
1807
+ "step": 29300
1808
+ },
1809
+ {
1810
+ "epoch": 0.7,
1811
+ "learning_rate": 0.0001316255851443661,
1812
+ "loss": 3.603,
1813
+ "step": 29400
1814
+ },
1815
+ {
1816
+ "epoch": 0.7,
1817
+ "learning_rate": 0.00012975453888853402,
1818
+ "loss": 3.5971,
1819
+ "step": 29500
1820
+ },
1821
+ {
1822
+ "epoch": 0.7,
1823
+ "learning_rate": 0.00012787445855677994,
1824
+ "loss": 3.5955,
1825
+ "step": 29600
1826
+ },
1827
+ {
1828
+ "epoch": 0.7,
1829
+ "learning_rate": 0.00012600440230489343,
1830
+ "loss": 3.5974,
1831
+ "step": 29700
1832
+ },
1833
+ {
1834
+ "epoch": 0.71,
1835
+ "learning_rate": 0.0001241444790393915,
1836
+ "loss": 3.5965,
1837
+ "step": 29800
1838
+ },
1839
+ {
1840
+ "epoch": 0.71,
1841
+ "learning_rate": 0.00012229479707667653,
1842
+ "loss": 3.6012,
1843
+ "step": 29900
1844
+ },
1845
+ {
1846
+ "epoch": 0.71,
1847
+ "learning_rate": 0.00012045546413672746,
1848
+ "loss": 3.597,
1849
+ "step": 30000
1850
+ },
1851
+ {
1852
+ "epoch": 0.71,
1853
+ "eval_loss": 3.617741823196411,
1854
+ "eval_runtime": 6508.3328,
1855
+ "eval_samples_per_second": 84.669,
1856
+ "eval_steps_per_second": 21.167,
1857
+ "step": 30000
1858
+ },
1859
+ {
1860
+ "epoch": 0.71,
1861
+ "learning_rate": 0.00011862658733682693,
1862
+ "loss": 3.5872,
1863
+ "step": 30100
1864
+ },
1865
+ {
1866
+ "epoch": 0.72,
1867
+ "learning_rate": 0.00011680827318532343,
1868
+ "loss": 3.5905,
1869
+ "step": 30200
1870
+ },
1871
+ {
1872
+ "epoch": 0.72,
1873
+ "learning_rate": 0.00011500062757542787,
1874
+ "loss": 3.5966,
1875
+ "step": 30300
1876
+ },
1877
+ {
1878
+ "epoch": 0.72,
1879
+ "learning_rate": 0.00011320375577904705,
1880
+ "loss": 3.5901,
1881
+ "step": 30400
1882
+ },
1883
+ {
1884
+ "epoch": 0.72,
1885
+ "learning_rate": 0.00011141776244065287,
1886
+ "loss": 3.5916,
1887
+ "step": 30500
1888
+ },
1889
+ {
1890
+ "epoch": 0.73,
1891
+ "learning_rate": 0.00010964275157118847,
1892
+ "loss": 3.5895,
1893
+ "step": 30600
1894
+ },
1895
+ {
1896
+ "epoch": 0.73,
1897
+ "learning_rate": 0.00010787882654201032,
1898
+ "loss": 3.5866,
1899
+ "step": 30700
1900
+ },
1901
+ {
1902
+ "epoch": 0.73,
1903
+ "learning_rate": 0.00010612609007886857,
1904
+ "loss": 3.5895,
1905
+ "step": 30800
1906
+ },
1907
+ {
1908
+ "epoch": 0.73,
1909
+ "learning_rate": 0.00010438464425592469,
1910
+ "loss": 3.5874,
1911
+ "step": 30900
1912
+ },
1913
+ {
1914
+ "epoch": 0.74,
1915
+ "learning_rate": 0.00010265459048980658,
1916
+ "loss": 3.5868,
1917
+ "step": 31000
1918
+ },
1919
+ {
1920
+ "epoch": 0.74,
1921
+ "learning_rate": 0.000100936029533703,
1922
+ "loss": 3.5787,
1923
+ "step": 31100
1924
+ },
1925
+ {
1926
+ "epoch": 0.74,
1927
+ "learning_rate": 9.922906147149525e-05,
1928
+ "loss": 3.5839,
1929
+ "step": 31200
1930
+ },
1931
+ {
1932
+ "epoch": 0.74,
1933
+ "learning_rate": 9.753378571192895e-05,
1934
+ "loss": 3.5852,
1935
+ "step": 31300
1936
+ },
1937
+ {
1938
+ "epoch": 0.74,
1939
+ "learning_rate": 9.585030098282516e-05,
1940
+ "loss": 3.5745,
1941
+ "step": 31400
1942
+ },
1943
+ {
1944
+ "epoch": 0.75,
1945
+ "learning_rate": 9.417870532532991e-05,
1946
+ "loss": 3.5768,
1947
+ "step": 31500
1948
+ },
1949
+ {
1950
+ "epoch": 0.75,
1951
+ "learning_rate": 9.251909608820541e-05,
1952
+ "loss": 3.577,
1953
+ "step": 31600
1954
+ },
1955
+ {
1956
+ "epoch": 0.75,
1957
+ "learning_rate": 9.087156992216018e-05,
1958
+ "loss": 3.5845,
1959
+ "step": 31700
1960
+ },
1961
+ {
1962
+ "epoch": 0.75,
1963
+ "learning_rate": 8.925251564625636e-05,
1964
+ "loss": 3.5767,
1965
+ "step": 31800
1966
+ },
1967
+ {
1968
+ "epoch": 0.76,
1969
+ "learning_rate": 8.762931954253596e-05,
1970
+ "loss": 3.5754,
1971
+ "step": 31900
1972
+ },
1973
+ {
1974
+ "epoch": 0.76,
1975
+ "learning_rate": 8.60184912759454e-05,
1976
+ "loss": 3.5723,
1977
+ "step": 32000
1978
+ },
1979
+ {
1980
+ "epoch": 0.76,
1981
+ "learning_rate": 8.442012465633435e-05,
1982
+ "loss": 3.5735,
1983
+ "step": 32100
1984
+ },
1985
+ {
1986
+ "epoch": 0.76,
1987
+ "learning_rate": 8.283431276782354e-05,
1988
+ "loss": 3.5732,
1989
+ "step": 32200
1990
+ },
1991
+ {
1992
+ "epoch": 0.77,
1993
+ "learning_rate": 8.126114796338322e-05,
1994
+ "loss": 3.5705,
1995
+ "step": 32300
1996
+ },
1997
+ {
1998
+ "epoch": 0.77,
1999
+ "learning_rate": 7.971626276492257e-05,
2000
+ "loss": 3.5694,
2001
+ "step": 32400
2002
+ },
2003
+ {
2004
+ "epoch": 0.77,
2005
+ "learning_rate": 7.816853749295341e-05,
2006
+ "loss": 3.5698,
2007
+ "step": 32500
2008
+ },
2009
+ {
2010
+ "epoch": 0.77,
2011
+ "learning_rate": 7.663373102593709e-05,
2012
+ "loss": 3.5638,
2013
+ "step": 32600
2014
+ },
2015
+ {
2016
+ "epoch": 0.78,
2017
+ "learning_rate": 7.51119327464399e-05,
2018
+ "loss": 3.5674,
2019
+ "step": 32700
2020
+ },
2021
+ {
2022
+ "epoch": 0.78,
2023
+ "learning_rate": 7.36032312794699e-05,
2024
+ "loss": 3.5615,
2025
+ "step": 32800
2026
+ },
2027
+ {
2028
+ "epoch": 0.78,
2029
+ "learning_rate": 7.21077144873156e-05,
2030
+ "loss": 3.5749,
2031
+ "step": 32900
2032
+ },
2033
+ {
2034
+ "epoch": 0.78,
2035
+ "learning_rate": 7.062546946442954e-05,
2036
+ "loss": 3.5659,
2037
+ "step": 33000
2038
+ },
2039
+ {
2040
+ "epoch": 0.78,
2041
+ "learning_rate": 6.915658253235543e-05,
2042
+ "loss": 3.5661,
2043
+ "step": 33100
2044
+ },
2045
+ {
2046
+ "epoch": 0.79,
2047
+ "learning_rate": 6.770113923470201e-05,
2048
+ "loss": 3.5628,
2049
+ "step": 33200
2050
+ },
2051
+ {
2052
+ "epoch": 0.79,
2053
+ "learning_rate": 6.625922433216026e-05,
2054
+ "loss": 3.5597,
2055
+ "step": 33300
2056
+ },
2057
+ {
2058
+ "epoch": 0.79,
2059
+ "learning_rate": 6.483092179756783e-05,
2060
+ "loss": 3.5658,
2061
+ "step": 33400
2062
+ },
2063
+ {
2064
+ "epoch": 0.79,
2065
+ "learning_rate": 6.341631481101857e-05,
2066
+ "loss": 3.5596,
2067
+ "step": 33500
2068
+ },
2069
+ {
2070
+ "epoch": 0.8,
2071
+ "learning_rate": 6.20154857550183e-05,
2072
+ "loss": 3.5628,
2073
+ "step": 33600
2074
+ },
2075
+ {
2076
+ "epoch": 0.8,
2077
+ "learning_rate": 6.062851620968693e-05,
2078
+ "loss": 3.5562,
2079
+ "step": 33700
2080
+ },
2081
+ {
2082
+ "epoch": 0.8,
2083
+ "learning_rate": 5.925548694800801e-05,
2084
+ "loss": 3.5659,
2085
+ "step": 33800
2086
+ },
2087
+ {
2088
+ "epoch": 0.8,
2089
+ "learning_rate": 5.789647793112406e-05,
2090
+ "loss": 3.5578,
2091
+ "step": 33900
2092
+ },
2093
+ {
2094
+ "epoch": 0.81,
2095
+ "learning_rate": 5.6551568303680585e-05,
2096
+ "loss": 3.5617,
2097
+ "step": 34000
2098
+ },
2099
+ {
2100
+ "epoch": 0.81,
2101
+ "learning_rate": 5.5220836389216264e-05,
2102
+ "loss": 3.5618,
2103
+ "step": 34100
2104
+ },
2105
+ {
2106
+ "epoch": 0.81,
2107
+ "learning_rate": 5.390435968560195e-05,
2108
+ "loss": 3.5566,
2109
+ "step": 34200
2110
+ },
2111
+ {
2112
+ "epoch": 0.81,
2113
+ "learning_rate": 5.260221486052765e-05,
2114
+ "loss": 3.558,
2115
+ "step": 34300
2116
+ },
2117
+ {
2118
+ "epoch": 0.82,
2119
+ "learning_rate": 5.131447774703693e-05,
2120
+ "loss": 3.5553,
2121
+ "step": 34400
2122
+ },
2123
+ {
2124
+ "epoch": 0.82,
2125
+ "learning_rate": 5.004122333911149e-05,
2126
+ "loss": 3.5587,
2127
+ "step": 34500
2128
+ },
2129
+ {
2130
+ "epoch": 0.82,
2131
+ "learning_rate": 4.8782525787302994e-05,
2132
+ "loss": 3.5585,
2133
+ "step": 34600
2134
+ },
2135
+ {
2136
+ "epoch": 0.82,
2137
+ "learning_rate": 4.7538458394415367e-05,
2138
+ "loss": 3.5541,
2139
+ "step": 34700
2140
+ },
2141
+ {
2142
+ "epoch": 0.83,
2143
+ "learning_rate": 4.630909361123535e-05,
2144
+ "loss": 3.5486,
2145
+ "step": 34800
2146
+ },
2147
+ {
2148
+ "epoch": 0.83,
2149
+ "learning_rate": 4.509450303231335e-05,
2150
+ "loss": 3.5527,
2151
+ "step": 34900
2152
+ },
2153
+ {
2154
+ "epoch": 0.83,
2155
+ "learning_rate": 4.3894757391794366e-05,
2156
+ "loss": 3.5554,
2157
+ "step": 35000
2158
+ },
2159
+ {
2160
+ "epoch": 0.83,
2161
+ "eval_loss": 3.5770018100738525,
2162
+ "eval_runtime": 6272.5699,
2163
+ "eval_samples_per_second": 87.852,
2164
+ "eval_steps_per_second": 21.963,
2165
+ "step": 35000
2166
+ },
2167
+ {
2168
+ "epoch": 0.83,
2169
+ "learning_rate": 4.27099265592979e-05,
2170
+ "loss": 3.5507,
2171
+ "step": 35100
2172
+ },
2173
+ {
2174
+ "epoch": 0.83,
2175
+ "learning_rate": 4.154007953584973e-05,
2176
+ "loss": 3.5502,
2177
+ "step": 35200
2178
+ },
2179
+ {
2180
+ "epoch": 0.84,
2181
+ "learning_rate": 4.038528444986291e-05,
2182
+ "loss": 3.5468,
2183
+ "step": 35300
2184
+ },
2185
+ {
2186
+ "epoch": 0.84,
2187
+ "learning_rate": 3.9245608553170395e-05,
2188
+ "loss": 3.5483,
2189
+ "step": 35400
2190
+ },
2191
+ {
2192
+ "epoch": 0.84,
2193
+ "learning_rate": 3.812111821710867e-05,
2194
+ "loss": 3.5482,
2195
+ "step": 35500
2196
+ },
2197
+ {
2198
+ "epoch": 0.84,
2199
+ "learning_rate": 3.701187892865215e-05,
2200
+ "loss": 3.5497,
2201
+ "step": 35600
2202
+ },
2203
+ {
2204
+ "epoch": 0.85,
2205
+ "learning_rate": 3.591795528659971e-05,
2206
+ "loss": 3.5513,
2207
+ "step": 35700
2208
+ },
2209
+ {
2210
+ "epoch": 0.85,
2211
+ "learning_rate": 3.4839410997812365e-05,
2212
+ "loss": 3.5471,
2213
+ "step": 35800
2214
+ },
2215
+ {
2216
+ "epoch": 0.85,
2217
+ "learning_rate": 3.377630887350332e-05,
2218
+ "loss": 3.5544,
2219
+ "step": 35900
2220
+ },
2221
+ {
2222
+ "epoch": 0.85,
2223
+ "learning_rate": 3.272871082558024e-05,
2224
+ "loss": 3.5426,
2225
+ "step": 36000
2226
+ },
2227
+ {
2228
+ "epoch": 0.86,
2229
+ "learning_rate": 3.169667786303914e-05,
2230
+ "loss": 3.5429,
2231
+ "step": 36100
2232
+ },
2233
+ {
2234
+ "epoch": 0.86,
2235
+ "learning_rate": 3.068027008841208e-05,
2236
+ "loss": 3.5441,
2237
+ "step": 36200
2238
+ },
2239
+ {
2240
+ "epoch": 0.86,
2241
+ "learning_rate": 2.9679546694266342e-05,
2242
+ "loss": 3.5479,
2243
+ "step": 36300
2244
+ },
2245
+ {
2246
+ "epoch": 0.86,
2247
+ "learning_rate": 2.869456595975762e-05,
2248
+ "loss": 3.5448,
2249
+ "step": 36400
2250
+ },
2251
+ {
2252
+ "epoch": 0.87,
2253
+ "learning_rate": 2.772538524723592e-05,
2254
+ "loss": 3.5434,
2255
+ "step": 36500
2256
+ },
2257
+ {
2258
+ "epoch": 0.87,
2259
+ "learning_rate": 2.6772060998904855e-05,
2260
+ "loss": 3.545,
2261
+ "step": 36600
2262
+ },
2263
+ {
2264
+ "epoch": 0.87,
2265
+ "learning_rate": 2.583464873353487e-05,
2266
+ "loss": 3.5468,
2267
+ "step": 36700
2268
+ },
2269
+ {
2270
+ "epoch": 0.87,
2271
+ "learning_rate": 2.4913203043229636e-05,
2272
+ "loss": 3.5417,
2273
+ "step": 36800
2274
+ },
2275
+ {
2276
+ "epoch": 0.88,
2277
+ "learning_rate": 2.4007777590247125e-05,
2278
+ "loss": 3.5426,
2279
+ "step": 36900
2280
+ },
2281
+ {
2282
+ "epoch": 0.88,
2283
+ "learning_rate": 2.311842510387417e-05,
2284
+ "loss": 3.5383,
2285
+ "step": 37000
2286
+ },
2287
+ {
2288
+ "epoch": 0.88,
2289
+ "learning_rate": 2.2253849669299984e-05,
2290
+ "loss": 3.5409,
2291
+ "step": 37100
2292
+ },
2293
+ {
2294
+ "epoch": 0.88,
2295
+ "learning_rate": 2.1396635552045304e-05,
2296
+ "loss": 3.5476,
2297
+ "step": 37200
2298
+ },
2299
+ {
2300
+ "epoch": 0.88,
2301
+ "learning_rate": 2.0555646466550592e-05,
2302
+ "loss": 3.5411,
2303
+ "step": 37300
2304
+ },
2305
+ {
2306
+ "epoch": 0.89,
2307
+ "learning_rate": 1.973093138952013e-05,
2308
+ "loss": 3.5394,
2309
+ "step": 37400
2310
+ },
2311
+ {
2312
+ "epoch": 0.89,
2313
+ "learning_rate": 1.8922538349908478e-05,
2314
+ "loss": 3.5395,
2315
+ "step": 37500
2316
+ },
2317
+ {
2318
+ "epoch": 0.89,
2319
+ "learning_rate": 1.81305144261232e-05,
2320
+ "loss": 3.5353,
2321
+ "step": 37600
2322
+ },
2323
+ {
2324
+ "epoch": 0.89,
2325
+ "learning_rate": 1.7354905743283154e-05,
2326
+ "loss": 3.5405,
2327
+ "step": 37700
2328
+ },
2329
+ {
2330
+ "epoch": 0.9,
2331
+ "learning_rate": 1.6595757470532535e-05,
2332
+ "loss": 3.5375,
2333
+ "step": 37800
2334
+ },
2335
+ {
2336
+ "epoch": 0.9,
2337
+ "learning_rate": 1.585311381841e-05,
2338
+ "loss": 3.5369,
2339
+ "step": 37900
2340
+ },
2341
+ {
2342
+ "epoch": 0.9,
2343
+ "learning_rate": 1.5127018036274286e-05,
2344
+ "loss": 3.5393,
2345
+ "step": 38000
2346
+ },
2347
+ {
2348
+ "epoch": 0.9,
2349
+ "learning_rate": 1.4417512409785326e-05,
2350
+ "loss": 3.5358,
2351
+ "step": 38100
2352
+ },
2353
+ {
2354
+ "epoch": 0.91,
2355
+ "learning_rate": 1.3724638258441644e-05,
2356
+ "loss": 3.5394,
2357
+ "step": 38200
2358
+ },
2359
+ {
2360
+ "epoch": 0.91,
2361
+ "learning_rate": 1.3048435933174273e-05,
2362
+ "loss": 3.5371,
2363
+ "step": 38300
2364
+ },
2365
+ {
2366
+ "epoch": 0.91,
2367
+ "learning_rate": 1.2388944813996426e-05,
2368
+ "loss": 3.5387,
2369
+ "step": 38400
2370
+ },
2371
+ {
2372
+ "epoch": 0.91,
2373
+ "learning_rate": 1.1746203307710511e-05,
2374
+ "loss": 3.5385,
2375
+ "step": 38500
2376
+ },
2377
+ {
2378
+ "epoch": 0.92,
2379
+ "learning_rate": 1.1120248845671176e-05,
2380
+ "loss": 3.5403,
2381
+ "step": 38600
2382
+ },
2383
+ {
2384
+ "epoch": 0.92,
2385
+ "learning_rate": 1.0511117881605623e-05,
2386
+ "loss": 3.5324,
2387
+ "step": 38700
2388
+ },
2389
+ {
2390
+ "epoch": 0.92,
2391
+ "learning_rate": 9.918845889490445e-06,
2392
+ "loss": 3.5405,
2393
+ "step": 38800
2394
+ },
2395
+ {
2396
+ "epoch": 0.92,
2397
+ "learning_rate": 9.3434673614858e-06,
2398
+ "loss": 3.5369,
2399
+ "step": 38900
2400
+ },
2401
+ {
2402
+ "epoch": 0.92,
2403
+ "learning_rate": 8.785015805926864e-06,
2404
+ "loss": 3.5344,
2405
+ "step": 39000
2406
+ },
2407
+ {
2408
+ "epoch": 0.93,
2409
+ "learning_rate": 8.243523745372149e-06,
2410
+ "loss": 3.5345,
2411
+ "step": 39100
2412
+ },
2413
+ {
2414
+ "epoch": 0.93,
2415
+ "learning_rate": 7.71902271470949e-06,
2416
+ "loss": 3.5374,
2417
+ "step": 39200
2418
+ },
2419
+ {
2420
+ "epoch": 0.93,
2421
+ "learning_rate": 7.211543259319907e-06,
2422
+ "loss": 3.538,
2423
+ "step": 39300
2424
+ },
2425
+ {
2426
+ "epoch": 0.93,
2427
+ "learning_rate": 6.725934718863668e-06,
2428
+ "loss": 3.5348,
2429
+ "step": 39400
2430
+ },
2431
+ {
2432
+ "epoch": 0.94,
2433
+ "learning_rate": 6.252415148280509e-06,
2434
+ "loss": 3.5296,
2435
+ "step": 39500
2436
+ },
2437
+ {
2438
+ "epoch": 0.94,
2439
+ "learning_rate": 5.796002563835378e-06,
2440
+ "loss": 3.5329,
2441
+ "step": 39600
2442
+ },
2443
+ {
2444
+ "epoch": 0.94,
2445
+ "learning_rate": 5.356723545640385e-06,
2446
+ "loss": 3.5323,
2447
+ "step": 39700
2448
+ },
2449
+ {
2450
+ "epoch": 0.94,
2451
+ "learning_rate": 4.934603675999771e-06,
2452
+ "loss": 3.5358,
2453
+ "step": 39800
2454
+ },
2455
+ {
2456
+ "epoch": 0.95,
2457
+ "learning_rate": 4.529667537919968e-06,
2458
+ "loss": 3.5388,
2459
+ "step": 39900
2460
+ },
2461
+ {
2462
+ "epoch": 0.95,
2463
+ "learning_rate": 4.141938713677839e-06,
2464
+ "loss": 3.536,
2465
+ "step": 40000
2466
+ },
2467
+ {
2468
+ "epoch": 0.95,
2469
+ "eval_loss": 3.5582468509674072,
2470
+ "eval_runtime": 6284.9941,
2471
+ "eval_samples_per_second": 87.678,
2472
+ "eval_steps_per_second": 21.92,
2473
+ "step": 40000
2474
+ },
2475
+ {
2476
+ "epoch": 0.95,
2477
+ "learning_rate": 3.7714397834476497e-06,
2478
+ "loss": 3.5315,
2479
+ "step": 40100
2480
+ },
2481
+ {
2482
+ "epoch": 0.95,
2483
+ "learning_rate": 3.418192323985647e-06,
2484
+ "loss": 3.5348,
2485
+ "step": 40200
2486
+ },
2487
+ {
2488
+ "epoch": 0.96,
2489
+ "learning_rate": 3.082216907373836e-06,
2490
+ "loss": 3.5332,
2491
+ "step": 40300
2492
+ },
2493
+ {
2494
+ "epoch": 0.96,
2495
+ "learning_rate": 2.7635330998217352e-06,
2496
+ "loss": 3.5331,
2497
+ "step": 40400
2498
+ },
2499
+ {
2500
+ "epoch": 0.96,
2501
+ "learning_rate": 2.462159460526991e-06,
2502
+ "loss": 3.5339,
2503
+ "step": 40500
2504
+ },
2505
+ {
2506
+ "epoch": 0.96,
2507
+ "learning_rate": 2.1781135405944396e-06,
2508
+ "loss": 3.5277,
2509
+ "step": 40600
2510
+ },
2511
+ {
2512
+ "epoch": 0.97,
2513
+ "learning_rate": 1.911411882014091e-06,
2514
+ "loss": 3.5324,
2515
+ "step": 40700
2516
+ },
2517
+ {
2518
+ "epoch": 0.97,
2519
+ "learning_rate": 1.662070016697803e-06,
2520
+ "loss": 3.5332,
2521
+ "step": 40800
2522
+ },
2523
+ {
2524
+ "epoch": 0.97,
2525
+ "learning_rate": 1.4301024655745675e-06,
2526
+ "loss": 3.5379,
2527
+ "step": 40900
2528
+ },
2529
+ {
2530
+ "epoch": 0.97,
2531
+ "learning_rate": 1.2155227377449562e-06,
2532
+ "loss": 3.53,
2533
+ "step": 41000
2534
+ },
2535
+ {
2536
+ "epoch": 0.97,
2537
+ "learning_rate": 1.0183433296945486e-06,
2538
+ "loss": 3.5326,
2539
+ "step": 41100
2540
+ },
2541
+ {
2542
+ "epoch": 0.98,
2543
+ "learning_rate": 8.38575724565882e-07,
2544
+ "loss": 3.5309,
2545
+ "step": 41200
2546
+ },
2547
+ {
2548
+ "epoch": 0.98,
2549
+ "learning_rate": 6.762303914898848e-07,
2550
+ "loss": 3.5324,
2551
+ "step": 41300
2552
+ },
2553
+ {
2554
+ "epoch": 0.98,
2555
+ "learning_rate": 5.326796054423432e-07,
2556
+ "loss": 3.5324,
2557
+ "step": 41400
2558
+ },
2559
+ {
2560
+ "epoch": 0.98,
2561
+ "learning_rate": 4.0503172472939884e-07,
2562
+ "loss": 3.5328,
2563
+ "step": 41500
2564
+ },
2565
+ {
2566
+ "epoch": 0.99,
2567
+ "learning_rate": 2.9483136438293033e-07,
2568
+ "loss": 3.5365,
2569
+ "step": 41600
2570
+ },
2571
+ {
2572
+ "epoch": 0.99,
2573
+ "learning_rate": 2.0208494214430937e-07,
2574
+ "loss": 3.528,
2575
+ "step": 41700
2576
+ },
2577
+ {
2578
+ "epoch": 0.99,
2579
+ "learning_rate": 1.267978592894958e-07,
2580
+ "loss": 3.5359,
2581
+ "step": 41800
2582
+ },
2583
+ {
2584
+ "epoch": 0.99,
2585
+ "learning_rate": 6.897450031438933e-08,
2586
+ "loss": 3.525,
2587
+ "step": 41900
2588
+ },
2589
+ {
2590
+ "epoch": 1.0,
2591
+ "learning_rate": 2.861823267953367e-08,
2592
+ "loss": 3.535,
2593
+ "step": 42000
2594
+ },
2595
+ {
2596
+ "epoch": 1.0,
2597
+ "learning_rate": 5.731406613940226e-09,
2598
+ "loss": 3.5303,
2599
+ "step": 42100
2600
+ },
2601
+ {
2602
+ "epoch": 1.0,
2603
+ "step": 42167,
2604
+ "total_flos": 2.0159394207481463e+19,
2605
+ "train_loss": 3.89913355111991,
2606
+ "train_runtime": 393554.9634,
2607
+ "train_samples_per_second": 27.429,
2608
+ "train_steps_per_second": 0.107
2609
+ }
2610
+ ],
2611
+ "logging_steps": 100,
2612
+ "max_steps": 42167,
2613
+ "num_train_epochs": 1,
2614
+ "save_steps": 5000,
2615
+ "total_flos": 2.0159394207481463e+19,
2616
+ "trial_name": null,
2617
+ "trial_params": null
2618
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a227871196f5b75675f4d14daccc2870aa39b6dfb92d1e781e3b3f195f66b35
3
+ size 4600