Joeran Bosma commited on
Commit
efe9092
·
1 Parent(s): 8a647a1

Initial release

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,104 @@
1
- ---
2
- license: cc-by-nc-sa-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ ---
4
+
5
+ # DRAGON RoBERTa base mixed-domain
6
+
7
+ Pretrained model on Dutch clinical reports using a masked language modeling (MLM) objective. It was introduced in [this](#pending) paper. The model was first pretrained using general domain data, as specified [here](https://huggingface.co/xlm-roberta-base). The pretrained model was taken from HuggingFace: [`xlm-roberta-base`](https://huggingface.co/xlm-roberta-base). Subsequently, the model was pretrained using domain-specific data (i.e., clinical reports). The tokenizer of [`xlm-roberta-base`](https://huggingface.co/xlm-roberta-base) was used.
8
+
9
+ ## Model description
10
+ RoBERTa is a transformers model that was pretrained on a large corpus of Dutch clinical reports in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way with an automatic process to generate inputs and labels from those texts.
11
+
12
+ This way, the model learns an inner representation of the Dutch medical language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled reports, for instance, you can train a standard classifier using the features produced by the BERT model as inputs.
13
+
14
+ ## Model variations
15
+ Multiple architectures were pretrained for the DRAGON challenge.
16
+
17
+ | Model | #params | Language |
18
+ |------------------------|--------------------------------|-------|
19
+ | [`joeranbosma/dragon-bert-base-mixed-domain`](https://huggingface.co/joeranbosma/dragon-bert-base-mixed-domain) | 109M | Dutch → Dutch |
20
+ | [`joeranbosma/dragon-roberta-base-mixed-domain`](https://huggingface.co/joeranbosma/dragon-roberta-base-mixed-domain) | 278M | Multiple → Dutch |
21
+ | [`joeranbosma/dragon-roberta-large-mixed-domain`](https://huggingface.co/joeranbosma/dragon-roberta-large-mixed-domain) | 560M | Multiple → Dutch |
22
+ | [`joeranbosma/dragon-longformer-base-mixed-domain`](https://huggingface.co/joeranbosma/dragon-longformer-base-mixed-domain) | 149M | English → Dutch |
23
+ | [`joeranbosma/dragon-longformer-large-mixed-domain`](https://huggingface.co/joeranbosma/dragon-longformer-large-mixed-domain) | 435M | English → Dutch |
24
+ | [`joeranbosma/dragon-bert-base-domain-specific`](https://huggingface.co/joeranbosma/dragon-bert-base-domain-specific) | 109M | Dutch |
25
+ | [`joeranbosma/dragon-roberta-base-domain-specific`](https://huggingface.co/joeranbosma/dragon-roberta-base-domain-specific) | 278M | Dutch |
26
+ | [`joeranbosma/dragon-roberta-large-domain-specific`](https://huggingface.co/joeranbosma/dragon-roberta-large-domain-specific) | 560M | Dutch |
27
+ | [`joeranbosma/dragon-longformer-base-domain-specific`](https://huggingface.co/joeranbosma/dragon-longformer-base-domain-specific) | 149M | Dutch |
28
+ | [`joeranbosma/dragon-longformer-large-domain-specific`](https://huggingface.co/joeranbosma/dragon-longformer-large-domain-specific) | 435M | Dutch |
29
+
30
+
31
+ ## Intended uses & limitations
32
+ You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task.
33
+
34
+ Note that this model is primarily aimed at being fine-tuned on tasks that use the whole text (e.g., a clinical report) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at model like GPT2.
35
+
36
+ ## How to use
37
+ You can use this model directly with a pipeline for masked language modeling:
38
+
39
+ ```python
40
+ from transformers import pipeline
41
+ unmasker = pipeline("fill-mask", model="joeranbosma/dragon-roberta-base-mixed-domain")
42
+ unmasker("Dit onderzoek geen aanwijzingen voor significant carcinoom. PIRADS <mask>.")
43
+ ```
44
+
45
+ Here is how to use this model to get the features of a given text in PyTorch:
46
+
47
+ ```python
48
+ from transformers import AutoTokenizer, AutoModel
49
+ tokenizer = AutoTokenizer.from_pretrained("joeranbosma/dragon-roberta-base-mixed-domain")
50
+ model = AutoModel.from_pretrained("joeranbosma/dragon-roberta-base-mixed-domain")
51
+ text = "Replace me by any text you'd like."
52
+ encoded_input = tokenizer(text, return_tensors="pt")
53
+ output = model(**encoded_input)
54
+ ```
55
+
56
+ ## Limitations and bias
57
+ Even if the training data used for this model could be characterized as fairly neutral, this model can have biased predictions. This bias will also affect all fine-tuned versions of this model.
58
+
59
+ ## Training data
60
+ For pretraining, 4,333,201 clinical reports (466,351 consecutive patients) were selected from Ziekenhuisgroep Twente from patients with a diagnostic or interventional visit between 13 July 2000 and 25 April 2023. 180,439 duplicate clinical reports (179,808 patients) were excluded, resulting in 4,152,762 included reports (463,692 patients). These reports were split into training (80%, 3,322,209 reports), validation (10%, 415,276 reports), and testing (10%, 415,277 reports). The testing reports were set aside for future analysis and are not used for pretraining.
61
+
62
+ ## Training procedure
63
+
64
+ ### Pretraining
65
+ The model was pretrained using masked language modeling (MLM): taking a sentence, the model randomly masks 15% of the words in the input then runs the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally masks the future tokens. It allows the model to learn a bidirectional representation of the sentence.
66
+
67
+ The details of the masking procedure for each sentence are the following:
68
+ - 15% of the tokens are masked.
69
+ - In 80% of the cases, the masked tokens are replaced by `[MASK]`.
70
+ - In 10% of the cases, the masked tokens are replaced by a random token (different) from the one they replace.
71
+ - In the 10% remaining cases, the masked tokens are left as is.
72
+
73
+ The HuggingFace implementation was used for pretraining: [`run_mlm.py`](https://github.com/huggingface/transformers/blob/7c6ec195adbfcd22cb6baeee64dd3c24a4b80c74/examples/pytorch/language-modeling/run_mlm.py).
74
+
75
+ ### Pretraining hyperparameters
76
+
77
+ The following hyperparameters were used during pretraining:
78
+ - `learning_rate`: 5e-05
79
+ - `train_batch_size`: 4
80
+ - `eval_batch_size`: 4
81
+ - `seed`: 42
82
+ - `gradient_accumulation_steps`: 4
83
+ - `total_train_batch_size`: 16
84
+ - `optimizer`: Adam with betas=(0.9,0.999) and epsilon=1e-08
85
+ - `lr_scheduler_type`: linear
86
+ - `num_epochs`: 3.0
87
+ - `max_seq_length`: 512
88
+
89
+ ### Framework versions
90
+
91
+ - Transformers 4.29.0.dev0
92
+ - Pytorch 2.0.0+cu117
93
+ - Datasets 2.11.0
94
+ - Tokenizers 0.13.3
95
+
96
+ ## Evaluation results
97
+
98
+ Pending evaluation on the DRAGON benchmark.
99
+
100
+ ### BibTeX entry and citation info
101
+
102
+ ```bibtex
103
+ @article{PENDING}
104
+ ```
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_accuracy": 0.9162802645968451,
4
+ "eval_loss": 0.33752548694610596,
5
+ "eval_runtime": 4485.3837,
6
+ "eval_samples": 140986,
7
+ "eval_samples_per_second": 31.432,
8
+ "eval_steps_per_second": 7.858,
9
+ "perplexity": 1.4014753272629237,
10
+ "train_loss": 0.4472081179925129,
11
+ "train_runtime": 493604.9481,
12
+ "train_samples": 1126219,
13
+ "train_samples_per_second": 6.845,
14
+ "train_steps_per_second": 0.428
15
+ }
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "xlm-roberta-base",
3
+ "architectures": [
4
+ "XLMRobertaForMaskedLM"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.29.0.dev0",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 250002
28
+ }
eval_results.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "eval_accuracy": 0.9162802645968451,
4
+ "eval_loss": 0.33752548694610596,
5
+ "eval_runtime": 4485.3837,
6
+ "eval_samples": 140986,
7
+ "eval_samples_per_second": 31.432,
8
+ "eval_steps_per_second": 7.858,
9
+ "perplexity": 1.4014753272629237
10
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:720933b287b58d466766bb378088b7ff49b363987e560983bda6c3f803d00a2b
3
+ size 1113254457
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:62c24cdc13d4c9952d63718d6c9fa4c287974249e16b7ade6d5a85e7bbb75626
3
+ size 17082660
tokenizer_config.json ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "clean_up_tokenization_spaces": true,
4
+ "cls_token": "<s>",
5
+ "eos_token": "</s>",
6
+ "mask_token": {
7
+ "__type": "AddedToken",
8
+ "content": "<mask>",
9
+ "lstrip": true,
10
+ "normalized": true,
11
+ "rstrip": false,
12
+ "single_word": false
13
+ },
14
+ "model_max_length": 512,
15
+ "pad_token": "<pad>",
16
+ "sep_token": "</s>",
17
+ "tokenizer_class": "XLMRobertaTokenizer",
18
+ "unk_token": "<unk>"
19
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 3.0,
3
+ "train_loss": 0.4472081179925129,
4
+ "train_runtime": 493604.9481,
5
+ "train_samples": 1126219,
6
+ "train_samples_per_second": 6.845,
7
+ "train_steps_per_second": 0.428
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,2935 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.3365662097930908,
3
+ "best_model_checkpoint": "/output/xlm-roberta-base-finetuned-full/checkpoint-205000",
4
+ "epoch": 2.9999680346646302,
5
+ "global_step": 211164,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.01,
12
+ "learning_rate": 4.9881608607527804e-05,
13
+ "loss": 1.5253,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.01,
18
+ "learning_rate": 4.97632172150556e-05,
19
+ "loss": 1.1464,
20
+ "step": 1000
21
+ },
22
+ {
23
+ "epoch": 0.02,
24
+ "learning_rate": 4.964482582258339e-05,
25
+ "loss": 1.0217,
26
+ "step": 1500
27
+ },
28
+ {
29
+ "epoch": 0.03,
30
+ "learning_rate": 4.9526434430111195e-05,
31
+ "loss": 0.9542,
32
+ "step": 2000
33
+ },
34
+ {
35
+ "epoch": 0.04,
36
+ "learning_rate": 4.9408043037638996e-05,
37
+ "loss": 0.9065,
38
+ "step": 2500
39
+ },
40
+ {
41
+ "epoch": 0.04,
42
+ "learning_rate": 4.928965164516679e-05,
43
+ "loss": 0.8708,
44
+ "step": 3000
45
+ },
46
+ {
47
+ "epoch": 0.05,
48
+ "learning_rate": 4.917126025269459e-05,
49
+ "loss": 0.8342,
50
+ "step": 3500
51
+ },
52
+ {
53
+ "epoch": 0.06,
54
+ "learning_rate": 4.905286886022239e-05,
55
+ "loss": 0.806,
56
+ "step": 4000
57
+ },
58
+ {
59
+ "epoch": 0.06,
60
+ "learning_rate": 4.893447746775019e-05,
61
+ "loss": 0.79,
62
+ "step": 4500
63
+ },
64
+ {
65
+ "epoch": 0.07,
66
+ "learning_rate": 4.881608607527798e-05,
67
+ "loss": 0.7697,
68
+ "step": 5000
69
+ },
70
+ {
71
+ "epoch": 0.07,
72
+ "eval_accuracy": 0.8496130075487093,
73
+ "eval_loss": 0.6976155042648315,
74
+ "eval_runtime": 4472.5845,
75
+ "eval_samples_per_second": 31.522,
76
+ "eval_steps_per_second": 7.881,
77
+ "step": 5000
78
+ },
79
+ {
80
+ "epoch": 0.08,
81
+ "learning_rate": 4.8697694682805785e-05,
82
+ "loss": 0.7583,
83
+ "step": 5500
84
+ },
85
+ {
86
+ "epoch": 0.09,
87
+ "learning_rate": 4.8579303290333586e-05,
88
+ "loss": 0.7434,
89
+ "step": 6000
90
+ },
91
+ {
92
+ "epoch": 0.09,
93
+ "learning_rate": 4.846091189786138e-05,
94
+ "loss": 0.7366,
95
+ "step": 6500
96
+ },
97
+ {
98
+ "epoch": 0.1,
99
+ "learning_rate": 4.8342520505389175e-05,
100
+ "loss": 0.7189,
101
+ "step": 7000
102
+ },
103
+ {
104
+ "epoch": 0.11,
105
+ "learning_rate": 4.822412911291698e-05,
106
+ "loss": 0.7056,
107
+ "step": 7500
108
+ },
109
+ {
110
+ "epoch": 0.11,
111
+ "learning_rate": 4.810573772044478e-05,
112
+ "loss": 0.6993,
113
+ "step": 8000
114
+ },
115
+ {
116
+ "epoch": 0.12,
117
+ "learning_rate": 4.798734632797257e-05,
118
+ "loss": 0.682,
119
+ "step": 8500
120
+ },
121
+ {
122
+ "epoch": 0.13,
123
+ "learning_rate": 4.786895493550037e-05,
124
+ "loss": 0.6792,
125
+ "step": 9000
126
+ },
127
+ {
128
+ "epoch": 0.13,
129
+ "learning_rate": 4.775056354302817e-05,
130
+ "loss": 0.6652,
131
+ "step": 9500
132
+ },
133
+ {
134
+ "epoch": 0.14,
135
+ "learning_rate": 4.763217215055597e-05,
136
+ "loss": 0.6638,
137
+ "step": 10000
138
+ },
139
+ {
140
+ "epoch": 0.14,
141
+ "eval_accuracy": 0.866376552224583,
142
+ "eval_loss": 0.6067455410957336,
143
+ "eval_runtime": 4473.7903,
144
+ "eval_samples_per_second": 31.514,
145
+ "eval_steps_per_second": 7.879,
146
+ "step": 10000
147
+ },
148
+ {
149
+ "epoch": 0.15,
150
+ "learning_rate": 4.7513780758083765e-05,
151
+ "loss": 0.6513,
152
+ "step": 10500
153
+ },
154
+ {
155
+ "epoch": 0.16,
156
+ "learning_rate": 4.739538936561157e-05,
157
+ "loss": 0.6492,
158
+ "step": 11000
159
+ },
160
+ {
161
+ "epoch": 0.16,
162
+ "learning_rate": 4.727699797313936e-05,
163
+ "loss": 0.6443,
164
+ "step": 11500
165
+ },
166
+ {
167
+ "epoch": 0.17,
168
+ "learning_rate": 4.715860658066716e-05,
169
+ "loss": 0.6369,
170
+ "step": 12000
171
+ },
172
+ {
173
+ "epoch": 0.18,
174
+ "learning_rate": 4.704021518819496e-05,
175
+ "loss": 0.6319,
176
+ "step": 12500
177
+ },
178
+ {
179
+ "epoch": 0.18,
180
+ "learning_rate": 4.692182379572276e-05,
181
+ "loss": 0.6247,
182
+ "step": 13000
183
+ },
184
+ {
185
+ "epoch": 0.19,
186
+ "learning_rate": 4.680343240325056e-05,
187
+ "loss": 0.6208,
188
+ "step": 13500
189
+ },
190
+ {
191
+ "epoch": 0.2,
192
+ "learning_rate": 4.6685041010778355e-05,
193
+ "loss": 0.619,
194
+ "step": 14000
195
+ },
196
+ {
197
+ "epoch": 0.21,
198
+ "learning_rate": 4.656664961830615e-05,
199
+ "loss": 0.6193,
200
+ "step": 14500
201
+ },
202
+ {
203
+ "epoch": 0.21,
204
+ "learning_rate": 4.644825822583395e-05,
205
+ "loss": 0.6101,
206
+ "step": 15000
207
+ },
208
+ {
209
+ "epoch": 0.21,
210
+ "eval_accuracy": 0.8746108749481479,
211
+ "eval_loss": 0.5597154498100281,
212
+ "eval_runtime": 4477.6442,
213
+ "eval_samples_per_second": 31.487,
214
+ "eval_steps_per_second": 7.872,
215
+ "step": 15000
216
+ },
217
+ {
218
+ "epoch": 0.22,
219
+ "learning_rate": 4.632986683336175e-05,
220
+ "loss": 0.6035,
221
+ "step": 15500
222
+ },
223
+ {
224
+ "epoch": 0.23,
225
+ "learning_rate": 4.621147544088955e-05,
226
+ "loss": 0.6028,
227
+ "step": 16000
228
+ },
229
+ {
230
+ "epoch": 0.23,
231
+ "learning_rate": 4.609308404841734e-05,
232
+ "loss": 0.5945,
233
+ "step": 16500
234
+ },
235
+ {
236
+ "epoch": 0.24,
237
+ "learning_rate": 4.5974692655945144e-05,
238
+ "loss": 0.5905,
239
+ "step": 17000
240
+ },
241
+ {
242
+ "epoch": 0.25,
243
+ "learning_rate": 4.5856301263472945e-05,
244
+ "loss": 0.5869,
245
+ "step": 17500
246
+ },
247
+ {
248
+ "epoch": 0.26,
249
+ "learning_rate": 4.573790987100074e-05,
250
+ "loss": 0.5849,
251
+ "step": 18000
252
+ },
253
+ {
254
+ "epoch": 0.26,
255
+ "learning_rate": 4.561951847852854e-05,
256
+ "loss": 0.579,
257
+ "step": 18500
258
+ },
259
+ {
260
+ "epoch": 0.27,
261
+ "learning_rate": 4.5501127086056336e-05,
262
+ "loss": 0.5755,
263
+ "step": 19000
264
+ },
265
+ {
266
+ "epoch": 0.28,
267
+ "learning_rate": 4.538273569358414e-05,
268
+ "loss": 0.5809,
269
+ "step": 19500
270
+ },
271
+ {
272
+ "epoch": 0.28,
273
+ "learning_rate": 4.526434430111193e-05,
274
+ "loss": 0.5702,
275
+ "step": 20000
276
+ },
277
+ {
278
+ "epoch": 0.28,
279
+ "eval_accuracy": 0.8805759285133897,
280
+ "eval_loss": 0.5277929306030273,
281
+ "eval_runtime": 4476.5281,
282
+ "eval_samples_per_second": 31.494,
283
+ "eval_steps_per_second": 7.874,
284
+ "step": 20000
285
+ },
286
+ {
287
+ "epoch": 0.29,
288
+ "learning_rate": 4.5145952908639734e-05,
289
+ "loss": 0.5763,
290
+ "step": 20500
291
+ },
292
+ {
293
+ "epoch": 0.3,
294
+ "learning_rate": 4.5027561516167535e-05,
295
+ "loss": 0.564,
296
+ "step": 21000
297
+ },
298
+ {
299
+ "epoch": 0.31,
300
+ "learning_rate": 4.490917012369533e-05,
301
+ "loss": 0.566,
302
+ "step": 21500
303
+ },
304
+ {
305
+ "epoch": 0.31,
306
+ "learning_rate": 4.4790778731223124e-05,
307
+ "loss": 0.5614,
308
+ "step": 22000
309
+ },
310
+ {
311
+ "epoch": 0.32,
312
+ "learning_rate": 4.4672387338750926e-05,
313
+ "loss": 0.5602,
314
+ "step": 22500
315
+ },
316
+ {
317
+ "epoch": 0.33,
318
+ "learning_rate": 4.455399594627873e-05,
319
+ "loss": 0.5566,
320
+ "step": 23000
321
+ },
322
+ {
323
+ "epoch": 0.33,
324
+ "learning_rate": 4.443560455380652e-05,
325
+ "loss": 0.5521,
326
+ "step": 23500
327
+ },
328
+ {
329
+ "epoch": 0.34,
330
+ "learning_rate": 4.431721316133432e-05,
331
+ "loss": 0.5505,
332
+ "step": 24000
333
+ },
334
+ {
335
+ "epoch": 0.35,
336
+ "learning_rate": 4.419882176886212e-05,
337
+ "loss": 0.5505,
338
+ "step": 24500
339
+ },
340
+ {
341
+ "epoch": 0.36,
342
+ "learning_rate": 4.408043037638992e-05,
343
+ "loss": 0.5433,
344
+ "step": 25000
345
+ },
346
+ {
347
+ "epoch": 0.36,
348
+ "eval_accuracy": 0.8846786107773023,
349
+ "eval_loss": 0.5012275576591492,
350
+ "eval_runtime": 4482.4883,
351
+ "eval_samples_per_second": 31.453,
352
+ "eval_steps_per_second": 7.863,
353
+ "step": 25000
354
+ },
355
+ {
356
+ "epoch": 0.36,
357
+ "learning_rate": 4.3962038983917714e-05,
358
+ "loss": 0.5462,
359
+ "step": 25500
360
+ },
361
+ {
362
+ "epoch": 0.37,
363
+ "learning_rate": 4.3843647591445516e-05,
364
+ "loss": 0.5402,
365
+ "step": 26000
366
+ },
367
+ {
368
+ "epoch": 0.38,
369
+ "learning_rate": 4.372525619897331e-05,
370
+ "loss": 0.543,
371
+ "step": 26500
372
+ },
373
+ {
374
+ "epoch": 0.38,
375
+ "learning_rate": 4.3606864806501105e-05,
376
+ "loss": 0.5411,
377
+ "step": 27000
378
+ },
379
+ {
380
+ "epoch": 0.39,
381
+ "learning_rate": 4.348847341402891e-05,
382
+ "loss": 0.5359,
383
+ "step": 27500
384
+ },
385
+ {
386
+ "epoch": 0.4,
387
+ "learning_rate": 4.337008202155671e-05,
388
+ "loss": 0.5375,
389
+ "step": 28000
390
+ },
391
+ {
392
+ "epoch": 0.4,
393
+ "learning_rate": 4.325169062908451e-05,
394
+ "loss": 0.5293,
395
+ "step": 28500
396
+ },
397
+ {
398
+ "epoch": 0.41,
399
+ "learning_rate": 4.3133299236612304e-05,
400
+ "loss": 0.5255,
401
+ "step": 29000
402
+ },
403
+ {
404
+ "epoch": 0.42,
405
+ "learning_rate": 4.30149078441401e-05,
406
+ "loss": 0.5281,
407
+ "step": 29500
408
+ },
409
+ {
410
+ "epoch": 0.43,
411
+ "learning_rate": 4.28965164516679e-05,
412
+ "loss": 0.5308,
413
+ "step": 30000
414
+ },
415
+ {
416
+ "epoch": 0.43,
417
+ "eval_accuracy": 0.8875591084578017,
418
+ "eval_loss": 0.48577040433883667,
419
+ "eval_runtime": 4481.7631,
420
+ "eval_samples_per_second": 31.458,
421
+ "eval_steps_per_second": 7.865,
422
+ "step": 30000
423
+ },
424
+ {
425
+ "epoch": 0.43,
426
+ "learning_rate": 4.27781250591957e-05,
427
+ "loss": 0.5304,
428
+ "step": 30500
429
+ },
430
+ {
431
+ "epoch": 0.44,
432
+ "learning_rate": 4.2659733666723497e-05,
433
+ "loss": 0.5272,
434
+ "step": 31000
435
+ },
436
+ {
437
+ "epoch": 0.45,
438
+ "learning_rate": 4.254134227425129e-05,
439
+ "loss": 0.523,
440
+ "step": 31500
441
+ },
442
+ {
443
+ "epoch": 0.45,
444
+ "learning_rate": 4.242295088177909e-05,
445
+ "loss": 0.5197,
446
+ "step": 32000
447
+ },
448
+ {
449
+ "epoch": 0.46,
450
+ "learning_rate": 4.2304559489306894e-05,
451
+ "loss": 0.5212,
452
+ "step": 32500
453
+ },
454
+ {
455
+ "epoch": 0.47,
456
+ "learning_rate": 4.218616809683469e-05,
457
+ "loss": 0.5203,
458
+ "step": 33000
459
+ },
460
+ {
461
+ "epoch": 0.48,
462
+ "learning_rate": 4.206777670436249e-05,
463
+ "loss": 0.5132,
464
+ "step": 33500
465
+ },
466
+ {
467
+ "epoch": 0.48,
468
+ "learning_rate": 4.1949385311890285e-05,
469
+ "loss": 0.5154,
470
+ "step": 34000
471
+ },
472
+ {
473
+ "epoch": 0.49,
474
+ "learning_rate": 4.183099391941808e-05,
475
+ "loss": 0.5193,
476
+ "step": 34500
477
+ },
478
+ {
479
+ "epoch": 0.5,
480
+ "learning_rate": 4.171260252694588e-05,
481
+ "loss": 0.5118,
482
+ "step": 35000
483
+ },
484
+ {
485
+ "epoch": 0.5,
486
+ "eval_accuracy": 0.890265619775884,
487
+ "eval_loss": 0.4702432155609131,
488
+ "eval_runtime": 4481.2911,
489
+ "eval_samples_per_second": 31.461,
490
+ "eval_steps_per_second": 7.865,
491
+ "step": 35000
492
+ },
493
+ {
494
+ "epoch": 0.5,
495
+ "learning_rate": 4.159421113447368e-05,
496
+ "loss": 0.5079,
497
+ "step": 35500
498
+ },
499
+ {
500
+ "epoch": 0.51,
501
+ "learning_rate": 4.1475819742001484e-05,
502
+ "loss": 0.5104,
503
+ "step": 36000
504
+ },
505
+ {
506
+ "epoch": 0.52,
507
+ "learning_rate": 4.135742834952928e-05,
508
+ "loss": 0.5047,
509
+ "step": 36500
510
+ },
511
+ {
512
+ "epoch": 0.53,
513
+ "learning_rate": 4.1239036957057073e-05,
514
+ "loss": 0.5102,
515
+ "step": 37000
516
+ },
517
+ {
518
+ "epoch": 0.53,
519
+ "learning_rate": 4.1120645564584875e-05,
520
+ "loss": 0.5054,
521
+ "step": 37500
522
+ },
523
+ {
524
+ "epoch": 0.54,
525
+ "learning_rate": 4.1002254172112676e-05,
526
+ "loss": 0.5052,
527
+ "step": 38000
528
+ },
529
+ {
530
+ "epoch": 0.55,
531
+ "learning_rate": 4.088386277964047e-05,
532
+ "loss": 0.502,
533
+ "step": 38500
534
+ },
535
+ {
536
+ "epoch": 0.55,
537
+ "learning_rate": 4.076547138716827e-05,
538
+ "loss": 0.5037,
539
+ "step": 39000
540
+ },
541
+ {
542
+ "epoch": 0.56,
543
+ "learning_rate": 4.064707999469607e-05,
544
+ "loss": 0.4998,
545
+ "step": 39500
546
+ },
547
+ {
548
+ "epoch": 0.57,
549
+ "learning_rate": 4.052868860222386e-05,
550
+ "loss": 0.4971,
551
+ "step": 40000
552
+ },
553
+ {
554
+ "epoch": 0.57,
555
+ "eval_accuracy": 0.8922914063763604,
556
+ "eval_loss": 0.4587230086326599,
557
+ "eval_runtime": 4481.3983,
558
+ "eval_samples_per_second": 31.46,
559
+ "eval_steps_per_second": 7.865,
560
+ "step": 40000
561
+ },
562
+ {
563
+ "epoch": 0.58,
564
+ "learning_rate": 4.041029720975166e-05,
565
+ "loss": 0.5026,
566
+ "step": 40500
567
+ },
568
+ {
569
+ "epoch": 0.58,
570
+ "learning_rate": 4.0291905817279465e-05,
571
+ "loss": 0.496,
572
+ "step": 41000
573
+ },
574
+ {
575
+ "epoch": 0.59,
576
+ "learning_rate": 4.017351442480726e-05,
577
+ "loss": 0.4965,
578
+ "step": 41500
579
+ },
580
+ {
581
+ "epoch": 0.6,
582
+ "learning_rate": 4.0055123032335054e-05,
583
+ "loss": 0.496,
584
+ "step": 42000
585
+ },
586
+ {
587
+ "epoch": 0.6,
588
+ "learning_rate": 3.9936731639862856e-05,
589
+ "loss": 0.4873,
590
+ "step": 42500
591
+ },
592
+ {
593
+ "epoch": 0.61,
594
+ "learning_rate": 3.981834024739066e-05,
595
+ "loss": 0.4977,
596
+ "step": 43000
597
+ },
598
+ {
599
+ "epoch": 0.62,
600
+ "learning_rate": 3.969994885491846e-05,
601
+ "loss": 0.4896,
602
+ "step": 43500
603
+ },
604
+ {
605
+ "epoch": 0.63,
606
+ "learning_rate": 3.958155746244625e-05,
607
+ "loss": 0.4858,
608
+ "step": 44000
609
+ },
610
+ {
611
+ "epoch": 0.63,
612
+ "learning_rate": 3.946316606997405e-05,
613
+ "loss": 0.4877,
614
+ "step": 44500
615
+ },
616
+ {
617
+ "epoch": 0.64,
618
+ "learning_rate": 3.934477467750185e-05,
619
+ "loss": 0.4857,
620
+ "step": 45000
621
+ },
622
+ {
623
+ "epoch": 0.64,
624
+ "eval_accuracy": 0.894234712380511,
625
+ "eval_loss": 0.4484991431236267,
626
+ "eval_runtime": 4480.0256,
627
+ "eval_samples_per_second": 31.47,
628
+ "eval_steps_per_second": 7.868,
629
+ "step": 45000
630
+ },
631
+ {
632
+ "epoch": 0.65,
633
+ "learning_rate": 3.922638328502965e-05,
634
+ "loss": 0.4888,
635
+ "step": 45500
636
+ },
637
+ {
638
+ "epoch": 0.65,
639
+ "learning_rate": 3.9107991892557446e-05,
640
+ "loss": 0.4827,
641
+ "step": 46000
642
+ },
643
+ {
644
+ "epoch": 0.66,
645
+ "learning_rate": 3.898960050008525e-05,
646
+ "loss": 0.4799,
647
+ "step": 46500
648
+ },
649
+ {
650
+ "epoch": 0.67,
651
+ "learning_rate": 3.887120910761304e-05,
652
+ "loss": 0.4815,
653
+ "step": 47000
654
+ },
655
+ {
656
+ "epoch": 0.67,
657
+ "learning_rate": 3.8752817715140836e-05,
658
+ "loss": 0.4847,
659
+ "step": 47500
660
+ },
661
+ {
662
+ "epoch": 0.68,
663
+ "learning_rate": 3.863442632266864e-05,
664
+ "loss": 0.4845,
665
+ "step": 48000
666
+ },
667
+ {
668
+ "epoch": 0.69,
669
+ "learning_rate": 3.851603493019644e-05,
670
+ "loss": 0.4832,
671
+ "step": 48500
672
+ },
673
+ {
674
+ "epoch": 0.7,
675
+ "learning_rate": 3.8397643537724234e-05,
676
+ "loss": 0.4783,
677
+ "step": 49000
678
+ },
679
+ {
680
+ "epoch": 0.7,
681
+ "learning_rate": 3.827925214525203e-05,
682
+ "loss": 0.4785,
683
+ "step": 49500
684
+ },
685
+ {
686
+ "epoch": 0.71,
687
+ "learning_rate": 3.816086075277983e-05,
688
+ "loss": 0.4812,
689
+ "step": 50000
690
+ },
691
+ {
692
+ "epoch": 0.71,
693
+ "eval_accuracy": 0.8959396424676428,
694
+ "eval_loss": 0.43902069330215454,
695
+ "eval_runtime": 4484.5246,
696
+ "eval_samples_per_second": 31.438,
697
+ "eval_steps_per_second": 7.86,
698
+ "step": 50000
699
+ },
700
+ {
701
+ "epoch": 0.72,
702
+ "learning_rate": 3.804246936030763e-05,
703
+ "loss": 0.4753,
704
+ "step": 50500
705
+ },
706
+ {
707
+ "epoch": 0.72,
708
+ "learning_rate": 3.792407796783543e-05,
709
+ "loss": 0.48,
710
+ "step": 51000
711
+ },
712
+ {
713
+ "epoch": 0.73,
714
+ "learning_rate": 3.780568657536323e-05,
715
+ "loss": 0.4777,
716
+ "step": 51500
717
+ },
718
+ {
719
+ "epoch": 0.74,
720
+ "learning_rate": 3.768729518289102e-05,
721
+ "loss": 0.4748,
722
+ "step": 52000
723
+ },
724
+ {
725
+ "epoch": 0.75,
726
+ "learning_rate": 3.7568903790418824e-05,
727
+ "loss": 0.4728,
728
+ "step": 52500
729
+ },
730
+ {
731
+ "epoch": 0.75,
732
+ "learning_rate": 3.745051239794662e-05,
733
+ "loss": 0.4677,
734
+ "step": 53000
735
+ },
736
+ {
737
+ "epoch": 0.76,
738
+ "learning_rate": 3.733212100547442e-05,
739
+ "loss": 0.4731,
740
+ "step": 53500
741
+ },
742
+ {
743
+ "epoch": 0.77,
744
+ "learning_rate": 3.721372961300222e-05,
745
+ "loss": 0.464,
746
+ "step": 54000
747
+ },
748
+ {
749
+ "epoch": 0.77,
750
+ "learning_rate": 3.7095338220530016e-05,
751
+ "loss": 0.4708,
752
+ "step": 54500
753
+ },
754
+ {
755
+ "epoch": 0.78,
756
+ "learning_rate": 3.697694682805781e-05,
757
+ "loss": 0.4716,
758
+ "step": 55000
759
+ },
760
+ {
761
+ "epoch": 0.78,
762
+ "eval_accuracy": 0.8970586319380052,
763
+ "eval_loss": 0.43407291173934937,
764
+ "eval_runtime": 4489.7395,
765
+ "eval_samples_per_second": 31.402,
766
+ "eval_steps_per_second": 7.851,
767
+ "step": 55000
768
+ },
769
+ {
770
+ "epoch": 0.79,
771
+ "learning_rate": 3.685855543558561e-05,
772
+ "loss": 0.469,
773
+ "step": 55500
774
+ },
775
+ {
776
+ "epoch": 0.8,
777
+ "learning_rate": 3.6740164043113414e-05,
778
+ "loss": 0.4684,
779
+ "step": 56000
780
+ },
781
+ {
782
+ "epoch": 0.8,
783
+ "learning_rate": 3.662177265064121e-05,
784
+ "loss": 0.4682,
785
+ "step": 56500
786
+ },
787
+ {
788
+ "epoch": 0.81,
789
+ "learning_rate": 3.6503381258169e-05,
790
+ "loss": 0.466,
791
+ "step": 57000
792
+ },
793
+ {
794
+ "epoch": 0.82,
795
+ "learning_rate": 3.6384989865696805e-05,
796
+ "loss": 0.4629,
797
+ "step": 57500
798
+ },
799
+ {
800
+ "epoch": 0.82,
801
+ "learning_rate": 3.6266598473224606e-05,
802
+ "loss": 0.4641,
803
+ "step": 58000
804
+ },
805
+ {
806
+ "epoch": 0.83,
807
+ "learning_rate": 3.614820708075241e-05,
808
+ "loss": 0.4602,
809
+ "step": 58500
810
+ },
811
+ {
812
+ "epoch": 0.84,
813
+ "learning_rate": 3.60298156882802e-05,
814
+ "loss": 0.463,
815
+ "step": 59000
816
+ },
817
+ {
818
+ "epoch": 0.85,
819
+ "learning_rate": 3.5911424295808e-05,
820
+ "loss": 0.4586,
821
+ "step": 59500
822
+ },
823
+ {
824
+ "epoch": 0.85,
825
+ "learning_rate": 3.57930329033358e-05,
826
+ "loss": 0.4601,
827
+ "step": 60000
828
+ },
829
+ {
830
+ "epoch": 0.85,
831
+ "eval_accuracy": 0.8986926934131002,
832
+ "eval_loss": 0.42562565207481384,
833
+ "eval_runtime": 4489.1177,
834
+ "eval_samples_per_second": 31.406,
835
+ "eval_steps_per_second": 7.852,
836
+ "step": 60000
837
+ },
838
+ {
839
+ "epoch": 0.86,
840
+ "learning_rate": 3.567464151086359e-05,
841
+ "loss": 0.4612,
842
+ "step": 60500
843
+ },
844
+ {
845
+ "epoch": 0.87,
846
+ "learning_rate": 3.5556250118391395e-05,
847
+ "loss": 0.4609,
848
+ "step": 61000
849
+ },
850
+ {
851
+ "epoch": 0.87,
852
+ "learning_rate": 3.5437858725919196e-05,
853
+ "loss": 0.4552,
854
+ "step": 61500
855
+ },
856
+ {
857
+ "epoch": 0.88,
858
+ "learning_rate": 3.531946733344699e-05,
859
+ "loss": 0.4624,
860
+ "step": 62000
861
+ },
862
+ {
863
+ "epoch": 0.89,
864
+ "learning_rate": 3.5201075940974785e-05,
865
+ "loss": 0.4572,
866
+ "step": 62500
867
+ },
868
+ {
869
+ "epoch": 0.9,
870
+ "learning_rate": 3.508268454850259e-05,
871
+ "loss": 0.4577,
872
+ "step": 63000
873
+ },
874
+ {
875
+ "epoch": 0.9,
876
+ "learning_rate": 3.496429315603039e-05,
877
+ "loss": 0.4527,
878
+ "step": 63500
879
+ },
880
+ {
881
+ "epoch": 0.91,
882
+ "learning_rate": 3.484590176355819e-05,
883
+ "loss": 0.4542,
884
+ "step": 64000
885
+ },
886
+ {
887
+ "epoch": 0.92,
888
+ "learning_rate": 3.472751037108598e-05,
889
+ "loss": 0.4549,
890
+ "step": 64500
891
+ },
892
+ {
893
+ "epoch": 0.92,
894
+ "learning_rate": 3.460911897861378e-05,
895
+ "loss": 0.4548,
896
+ "step": 65000
897
+ },
898
+ {
899
+ "epoch": 0.92,
900
+ "eval_accuracy": 0.9000358274024964,
901
+ "eval_loss": 0.418160617351532,
902
+ "eval_runtime": 4478.7532,
903
+ "eval_samples_per_second": 31.479,
904
+ "eval_steps_per_second": 7.87,
905
+ "step": 65000
906
+ },
907
+ {
908
+ "epoch": 0.93,
909
+ "learning_rate": 3.449072758614158e-05,
910
+ "loss": 0.4543,
911
+ "step": 65500
912
+ },
913
+ {
914
+ "epoch": 0.94,
915
+ "learning_rate": 3.4372336193669375e-05,
916
+ "loss": 0.4563,
917
+ "step": 66000
918
+ },
919
+ {
920
+ "epoch": 0.94,
921
+ "learning_rate": 3.425394480119718e-05,
922
+ "loss": 0.4534,
923
+ "step": 66500
924
+ },
925
+ {
926
+ "epoch": 0.95,
927
+ "learning_rate": 3.413555340872497e-05,
928
+ "loss": 0.4532,
929
+ "step": 67000
930
+ },
931
+ {
932
+ "epoch": 0.96,
933
+ "learning_rate": 3.401716201625277e-05,
934
+ "loss": 0.4512,
935
+ "step": 67500
936
+ },
937
+ {
938
+ "epoch": 0.97,
939
+ "learning_rate": 3.389877062378057e-05,
940
+ "loss": 0.4477,
941
+ "step": 68000
942
+ },
943
+ {
944
+ "epoch": 0.97,
945
+ "learning_rate": 3.378037923130837e-05,
946
+ "loss": 0.4487,
947
+ "step": 68500
948
+ },
949
+ {
950
+ "epoch": 0.98,
951
+ "learning_rate": 3.366198783883617e-05,
952
+ "loss": 0.4503,
953
+ "step": 69000
954
+ },
955
+ {
956
+ "epoch": 0.99,
957
+ "learning_rate": 3.3543596446363965e-05,
958
+ "loss": 0.445,
959
+ "step": 69500
960
+ },
961
+ {
962
+ "epoch": 0.99,
963
+ "learning_rate": 3.342520505389176e-05,
964
+ "loss": 0.4497,
965
+ "step": 70000
966
+ },
967
+ {
968
+ "epoch": 0.99,
969
+ "eval_accuracy": 0.9010131119019583,
970
+ "eval_loss": 0.4137929081916809,
971
+ "eval_runtime": 4478.1902,
972
+ "eval_samples_per_second": 31.483,
973
+ "eval_steps_per_second": 7.871,
974
+ "step": 70000
975
+ },
976
+ {
977
+ "epoch": 1.0,
978
+ "learning_rate": 3.330681366141956e-05,
979
+ "loss": 0.448,
980
+ "step": 70500
981
+ },
982
+ {
983
+ "epoch": 1.01,
984
+ "learning_rate": 3.318842226894736e-05,
985
+ "loss": 0.4412,
986
+ "step": 71000
987
+ },
988
+ {
989
+ "epoch": 1.02,
990
+ "learning_rate": 3.3070030876475164e-05,
991
+ "loss": 0.4441,
992
+ "step": 71500
993
+ },
994
+ {
995
+ "epoch": 1.02,
996
+ "learning_rate": 3.295163948400295e-05,
997
+ "loss": 0.4442,
998
+ "step": 72000
999
+ },
1000
+ {
1001
+ "epoch": 1.03,
1002
+ "learning_rate": 3.2833248091530754e-05,
1003
+ "loss": 0.4415,
1004
+ "step": 72500
1005
+ },
1006
+ {
1007
+ "epoch": 1.04,
1008
+ "learning_rate": 3.2714856699058555e-05,
1009
+ "loss": 0.4457,
1010
+ "step": 73000
1011
+ },
1012
+ {
1013
+ "epoch": 1.04,
1014
+ "learning_rate": 3.259646530658635e-05,
1015
+ "loss": 0.4438,
1016
+ "step": 73500
1017
+ },
1018
+ {
1019
+ "epoch": 1.05,
1020
+ "learning_rate": 3.247807391411415e-05,
1021
+ "loss": 0.4453,
1022
+ "step": 74000
1023
+ },
1024
+ {
1025
+ "epoch": 1.06,
1026
+ "learning_rate": 3.2359682521641946e-05,
1027
+ "loss": 0.4407,
1028
+ "step": 74500
1029
+ },
1030
+ {
1031
+ "epoch": 1.07,
1032
+ "learning_rate": 3.224129112916975e-05,
1033
+ "loss": 0.4376,
1034
+ "step": 75000
1035
+ },
1036
+ {
1037
+ "epoch": 1.07,
1038
+ "eval_accuracy": 0.9019829116607336,
1039
+ "eval_loss": 0.408815860748291,
1040
+ "eval_runtime": 4480.6277,
1041
+ "eval_samples_per_second": 31.466,
1042
+ "eval_steps_per_second": 7.867,
1043
+ "step": 75000
1044
+ },
1045
+ {
1046
+ "epoch": 1.07,
1047
+ "learning_rate": 3.212289973669754e-05,
1048
+ "loss": 0.4371,
1049
+ "step": 75500
1050
+ },
1051
+ {
1052
+ "epoch": 1.08,
1053
+ "learning_rate": 3.2004508344225344e-05,
1054
+ "loss": 0.4397,
1055
+ "step": 76000
1056
+ },
1057
+ {
1058
+ "epoch": 1.09,
1059
+ "learning_rate": 3.1886116951753145e-05,
1060
+ "loss": 0.4312,
1061
+ "step": 76500
1062
+ },
1063
+ {
1064
+ "epoch": 1.09,
1065
+ "learning_rate": 3.176772555928094e-05,
1066
+ "loss": 0.435,
1067
+ "step": 77000
1068
+ },
1069
+ {
1070
+ "epoch": 1.1,
1071
+ "learning_rate": 3.1649334166808734e-05,
1072
+ "loss": 0.4398,
1073
+ "step": 77500
1074
+ },
1075
+ {
1076
+ "epoch": 1.11,
1077
+ "learning_rate": 3.1530942774336536e-05,
1078
+ "loss": 0.4393,
1079
+ "step": 78000
1080
+ },
1081
+ {
1082
+ "epoch": 1.12,
1083
+ "learning_rate": 3.141255138186434e-05,
1084
+ "loss": 0.4371,
1085
+ "step": 78500
1086
+ },
1087
+ {
1088
+ "epoch": 1.12,
1089
+ "learning_rate": 3.129415998939213e-05,
1090
+ "loss": 0.4355,
1091
+ "step": 79000
1092
+ },
1093
+ {
1094
+ "epoch": 1.13,
1095
+ "learning_rate": 3.117576859691993e-05,
1096
+ "loss": 0.4328,
1097
+ "step": 79500
1098
+ },
1099
+ {
1100
+ "epoch": 1.14,
1101
+ "learning_rate": 3.105737720444773e-05,
1102
+ "loss": 0.4349,
1103
+ "step": 80000
1104
+ },
1105
+ {
1106
+ "epoch": 1.14,
1107
+ "eval_accuracy": 0.9031310444806294,
1108
+ "eval_loss": 0.40232041478157043,
1109
+ "eval_runtime": 4482.6913,
1110
+ "eval_samples_per_second": 31.451,
1111
+ "eval_steps_per_second": 7.863,
1112
+ "step": 80000
1113
+ },
1114
+ {
1115
+ "epoch": 1.14,
1116
+ "learning_rate": 3.093898581197553e-05,
1117
+ "loss": 0.4324,
1118
+ "step": 80500
1119
+ },
1120
+ {
1121
+ "epoch": 1.15,
1122
+ "learning_rate": 3.0820594419503324e-05,
1123
+ "loss": 0.4326,
1124
+ "step": 81000
1125
+ },
1126
+ {
1127
+ "epoch": 1.16,
1128
+ "learning_rate": 3.0702203027031126e-05,
1129
+ "loss": 0.4308,
1130
+ "step": 81500
1131
+ },
1132
+ {
1133
+ "epoch": 1.16,
1134
+ "learning_rate": 3.058381163455892e-05,
1135
+ "loss": 0.4317,
1136
+ "step": 82000
1137
+ },
1138
+ {
1139
+ "epoch": 1.17,
1140
+ "learning_rate": 3.0465420242086722e-05,
1141
+ "loss": 0.4322,
1142
+ "step": 82500
1143
+ },
1144
+ {
1145
+ "epoch": 1.18,
1146
+ "learning_rate": 3.0347028849614517e-05,
1147
+ "loss": 0.43,
1148
+ "step": 83000
1149
+ },
1150
+ {
1151
+ "epoch": 1.19,
1152
+ "learning_rate": 3.0228637457142318e-05,
1153
+ "loss": 0.4321,
1154
+ "step": 83500
1155
+ },
1156
+ {
1157
+ "epoch": 1.19,
1158
+ "learning_rate": 3.0110246064670116e-05,
1159
+ "loss": 0.4309,
1160
+ "step": 84000
1161
+ },
1162
+ {
1163
+ "epoch": 1.2,
1164
+ "learning_rate": 2.9991854672197918e-05,
1165
+ "loss": 0.4274,
1166
+ "step": 84500
1167
+ },
1168
+ {
1169
+ "epoch": 1.21,
1170
+ "learning_rate": 2.9873463279725712e-05,
1171
+ "loss": 0.4262,
1172
+ "step": 85000
1173
+ },
1174
+ {
1175
+ "epoch": 1.21,
1176
+ "eval_accuracy": 0.9039536284020224,
1177
+ "eval_loss": 0.39793965220451355,
1178
+ "eval_runtime": 4483.3415,
1179
+ "eval_samples_per_second": 31.447,
1180
+ "eval_steps_per_second": 7.862,
1181
+ "step": 85000
1182
+ },
1183
+ {
1184
+ "epoch": 1.21,
1185
+ "learning_rate": 2.975507188725351e-05,
1186
+ "loss": 0.4282,
1187
+ "step": 85500
1188
+ },
1189
+ {
1190
+ "epoch": 1.22,
1191
+ "learning_rate": 2.9636680494781312e-05,
1192
+ "loss": 0.4283,
1193
+ "step": 86000
1194
+ },
1195
+ {
1196
+ "epoch": 1.23,
1197
+ "learning_rate": 2.9518289102309103e-05,
1198
+ "loss": 0.4275,
1199
+ "step": 86500
1200
+ },
1201
+ {
1202
+ "epoch": 1.24,
1203
+ "learning_rate": 2.9399897709836905e-05,
1204
+ "loss": 0.4251,
1205
+ "step": 87000
1206
+ },
1207
+ {
1208
+ "epoch": 1.24,
1209
+ "learning_rate": 2.9281506317364703e-05,
1210
+ "loss": 0.4272,
1211
+ "step": 87500
1212
+ },
1213
+ {
1214
+ "epoch": 1.25,
1215
+ "learning_rate": 2.9163114924892504e-05,
1216
+ "loss": 0.428,
1217
+ "step": 88000
1218
+ },
1219
+ {
1220
+ "epoch": 1.26,
1221
+ "learning_rate": 2.90447235324203e-05,
1222
+ "loss": 0.4281,
1223
+ "step": 88500
1224
+ },
1225
+ {
1226
+ "epoch": 1.26,
1227
+ "learning_rate": 2.8926332139948097e-05,
1228
+ "loss": 0.4242,
1229
+ "step": 89000
1230
+ },
1231
+ {
1232
+ "epoch": 1.27,
1233
+ "learning_rate": 2.8807940747475898e-05,
1234
+ "loss": 0.4233,
1235
+ "step": 89500
1236
+ },
1237
+ {
1238
+ "epoch": 1.28,
1239
+ "learning_rate": 2.8689549355003696e-05,
1240
+ "loss": 0.4187,
1241
+ "step": 90000
1242
+ },
1243
+ {
1244
+ "epoch": 1.28,
1245
+ "eval_accuracy": 0.905044954941931,
1246
+ "eval_loss": 0.3930475413799286,
1247
+ "eval_runtime": 4483.2316,
1248
+ "eval_samples_per_second": 31.447,
1249
+ "eval_steps_per_second": 7.862,
1250
+ "step": 90000
1251
+ },
1252
+ {
1253
+ "epoch": 1.29,
1254
+ "learning_rate": 2.857115796253149e-05,
1255
+ "loss": 0.4248,
1256
+ "step": 90500
1257
+ },
1258
+ {
1259
+ "epoch": 1.29,
1260
+ "learning_rate": 2.8452766570059293e-05,
1261
+ "loss": 0.4256,
1262
+ "step": 91000
1263
+ },
1264
+ {
1265
+ "epoch": 1.3,
1266
+ "learning_rate": 2.833437517758709e-05,
1267
+ "loss": 0.4254,
1268
+ "step": 91500
1269
+ },
1270
+ {
1271
+ "epoch": 1.31,
1272
+ "learning_rate": 2.8215983785114885e-05,
1273
+ "loss": 0.4213,
1274
+ "step": 92000
1275
+ },
1276
+ {
1277
+ "epoch": 1.31,
1278
+ "learning_rate": 2.8097592392642687e-05,
1279
+ "loss": 0.4219,
1280
+ "step": 92500
1281
+ },
1282
+ {
1283
+ "epoch": 1.32,
1284
+ "learning_rate": 2.7979201000170485e-05,
1285
+ "loss": 0.4231,
1286
+ "step": 93000
1287
+ },
1288
+ {
1289
+ "epoch": 1.33,
1290
+ "learning_rate": 2.7860809607698286e-05,
1291
+ "loss": 0.4206,
1292
+ "step": 93500
1293
+ },
1294
+ {
1295
+ "epoch": 1.34,
1296
+ "learning_rate": 2.7742418215226078e-05,
1297
+ "loss": 0.4203,
1298
+ "step": 94000
1299
+ },
1300
+ {
1301
+ "epoch": 1.34,
1302
+ "learning_rate": 2.762402682275388e-05,
1303
+ "loss": 0.4174,
1304
+ "step": 94500
1305
+ },
1306
+ {
1307
+ "epoch": 1.35,
1308
+ "learning_rate": 2.7505635430281677e-05,
1309
+ "loss": 0.4183,
1310
+ "step": 95000
1311
+ },
1312
+ {
1313
+ "epoch": 1.35,
1314
+ "eval_accuracy": 0.9058014412128497,
1315
+ "eval_loss": 0.3888256549835205,
1316
+ "eval_runtime": 4483.1222,
1317
+ "eval_samples_per_second": 31.448,
1318
+ "eval_steps_per_second": 7.862,
1319
+ "step": 95000
1320
+ },
1321
+ {
1322
+ "epoch": 1.36,
1323
+ "learning_rate": 2.738724403780948e-05,
1324
+ "loss": 0.4226,
1325
+ "step": 95500
1326
+ },
1327
+ {
1328
+ "epoch": 1.36,
1329
+ "learning_rate": 2.7268852645337273e-05,
1330
+ "loss": 0.4213,
1331
+ "step": 96000
1332
+ },
1333
+ {
1334
+ "epoch": 1.37,
1335
+ "learning_rate": 2.715046125286507e-05,
1336
+ "loss": 0.4191,
1337
+ "step": 96500
1338
+ },
1339
+ {
1340
+ "epoch": 1.38,
1341
+ "learning_rate": 2.7032069860392873e-05,
1342
+ "loss": 0.4156,
1343
+ "step": 97000
1344
+ },
1345
+ {
1346
+ "epoch": 1.39,
1347
+ "learning_rate": 2.691367846792067e-05,
1348
+ "loss": 0.4179,
1349
+ "step": 97500
1350
+ },
1351
+ {
1352
+ "epoch": 1.39,
1353
+ "learning_rate": 2.6795287075448466e-05,
1354
+ "loss": 0.4158,
1355
+ "step": 98000
1356
+ },
1357
+ {
1358
+ "epoch": 1.4,
1359
+ "learning_rate": 2.6676895682976267e-05,
1360
+ "loss": 0.421,
1361
+ "step": 98500
1362
+ },
1363
+ {
1364
+ "epoch": 1.41,
1365
+ "learning_rate": 2.6558504290504065e-05,
1366
+ "loss": 0.417,
1367
+ "step": 99000
1368
+ },
1369
+ {
1370
+ "epoch": 1.41,
1371
+ "learning_rate": 2.644011289803186e-05,
1372
+ "loss": 0.415,
1373
+ "step": 99500
1374
+ },
1375
+ {
1376
+ "epoch": 1.42,
1377
+ "learning_rate": 2.632172150555966e-05,
1378
+ "loss": 0.4126,
1379
+ "step": 100000
1380
+ },
1381
+ {
1382
+ "epoch": 1.42,
1383
+ "eval_accuracy": 0.9066390531479637,
1384
+ "eval_loss": 0.3853781521320343,
1385
+ "eval_runtime": 4483.6399,
1386
+ "eval_samples_per_second": 31.445,
1387
+ "eval_steps_per_second": 7.861,
1388
+ "step": 100000
1389
+ },
1390
+ {
1391
+ "epoch": 1.43,
1392
+ "learning_rate": 2.620333011308746e-05,
1393
+ "loss": 0.4131,
1394
+ "step": 100500
1395
+ },
1396
+ {
1397
+ "epoch": 1.43,
1398
+ "learning_rate": 2.608493872061526e-05,
1399
+ "loss": 0.4168,
1400
+ "step": 101000
1401
+ },
1402
+ {
1403
+ "epoch": 1.44,
1404
+ "learning_rate": 2.5966547328143055e-05,
1405
+ "loss": 0.4114,
1406
+ "step": 101500
1407
+ },
1408
+ {
1409
+ "epoch": 1.45,
1410
+ "learning_rate": 2.5848155935670854e-05,
1411
+ "loss": 0.4164,
1412
+ "step": 102000
1413
+ },
1414
+ {
1415
+ "epoch": 1.46,
1416
+ "learning_rate": 2.5729764543198655e-05,
1417
+ "loss": 0.4157,
1418
+ "step": 102500
1419
+ },
1420
+ {
1421
+ "epoch": 1.46,
1422
+ "learning_rate": 2.5611373150726453e-05,
1423
+ "loss": 0.4155,
1424
+ "step": 103000
1425
+ },
1426
+ {
1427
+ "epoch": 1.47,
1428
+ "learning_rate": 2.5492981758254248e-05,
1429
+ "loss": 0.4123,
1430
+ "step": 103500
1431
+ },
1432
+ {
1433
+ "epoch": 1.48,
1434
+ "learning_rate": 2.5374590365782046e-05,
1435
+ "loss": 0.4124,
1436
+ "step": 104000
1437
+ },
1438
+ {
1439
+ "epoch": 1.48,
1440
+ "learning_rate": 2.5256198973309847e-05,
1441
+ "loss": 0.4148,
1442
+ "step": 104500
1443
+ },
1444
+ {
1445
+ "epoch": 1.49,
1446
+ "learning_rate": 2.5137807580837645e-05,
1447
+ "loss": 0.411,
1448
+ "step": 105000
1449
+ },
1450
+ {
1451
+ "epoch": 1.49,
1452
+ "eval_accuracy": 0.907043654550547,
1453
+ "eval_loss": 0.38217270374298096,
1454
+ "eval_runtime": 4482.6373,
1455
+ "eval_samples_per_second": 31.452,
1456
+ "eval_steps_per_second": 7.863,
1457
+ "step": 105000
1458
+ },
1459
+ {
1460
+ "epoch": 1.5,
1461
+ "learning_rate": 2.501941618836544e-05,
1462
+ "loss": 0.408,
1463
+ "step": 105500
1464
+ },
1465
+ {
1466
+ "epoch": 1.51,
1467
+ "learning_rate": 2.490102479589324e-05,
1468
+ "loss": 0.4123,
1469
+ "step": 106000
1470
+ },
1471
+ {
1472
+ "epoch": 1.51,
1473
+ "learning_rate": 2.4782633403421036e-05,
1474
+ "loss": 0.4125,
1475
+ "step": 106500
1476
+ },
1477
+ {
1478
+ "epoch": 1.52,
1479
+ "learning_rate": 2.4664242010948838e-05,
1480
+ "loss": 0.4117,
1481
+ "step": 107000
1482
+ },
1483
+ {
1484
+ "epoch": 1.53,
1485
+ "learning_rate": 2.4545850618476636e-05,
1486
+ "loss": 0.4083,
1487
+ "step": 107500
1488
+ },
1489
+ {
1490
+ "epoch": 1.53,
1491
+ "learning_rate": 2.4427459226004434e-05,
1492
+ "loss": 0.4117,
1493
+ "step": 108000
1494
+ },
1495
+ {
1496
+ "epoch": 1.54,
1497
+ "learning_rate": 2.4309067833532232e-05,
1498
+ "loss": 0.4079,
1499
+ "step": 108500
1500
+ },
1501
+ {
1502
+ "epoch": 1.55,
1503
+ "learning_rate": 2.419067644106003e-05,
1504
+ "loss": 0.4092,
1505
+ "step": 109000
1506
+ },
1507
+ {
1508
+ "epoch": 1.56,
1509
+ "learning_rate": 2.4072285048587828e-05,
1510
+ "loss": 0.4097,
1511
+ "step": 109500
1512
+ },
1513
+ {
1514
+ "epoch": 1.56,
1515
+ "learning_rate": 2.395389365611563e-05,
1516
+ "loss": 0.409,
1517
+ "step": 110000
1518
+ },
1519
+ {
1520
+ "epoch": 1.56,
1521
+ "eval_accuracy": 0.907909121265083,
1522
+ "eval_loss": 0.3783527910709381,
1523
+ "eval_runtime": 4481.8573,
1524
+ "eval_samples_per_second": 31.457,
1525
+ "eval_steps_per_second": 7.864,
1526
+ "step": 110000
1527
+ },
1528
+ {
1529
+ "epoch": 1.57,
1530
+ "learning_rate": 2.3835502263643424e-05,
1531
+ "loss": 0.403,
1532
+ "step": 110500
1533
+ },
1534
+ {
1535
+ "epoch": 1.58,
1536
+ "learning_rate": 2.3717110871171226e-05,
1537
+ "loss": 0.4109,
1538
+ "step": 111000
1539
+ },
1540
+ {
1541
+ "epoch": 1.58,
1542
+ "learning_rate": 2.359871947869902e-05,
1543
+ "loss": 0.4072,
1544
+ "step": 111500
1545
+ },
1546
+ {
1547
+ "epoch": 1.59,
1548
+ "learning_rate": 2.348032808622682e-05,
1549
+ "loss": 0.4063,
1550
+ "step": 112000
1551
+ },
1552
+ {
1553
+ "epoch": 1.6,
1554
+ "learning_rate": 2.336193669375462e-05,
1555
+ "loss": 0.4087,
1556
+ "step": 112500
1557
+ },
1558
+ {
1559
+ "epoch": 1.61,
1560
+ "learning_rate": 2.3243545301282415e-05,
1561
+ "loss": 0.4034,
1562
+ "step": 113000
1563
+ },
1564
+ {
1565
+ "epoch": 1.61,
1566
+ "learning_rate": 2.3125153908810216e-05,
1567
+ "loss": 0.4041,
1568
+ "step": 113500
1569
+ },
1570
+ {
1571
+ "epoch": 1.62,
1572
+ "learning_rate": 2.300676251633801e-05,
1573
+ "loss": 0.4044,
1574
+ "step": 114000
1575
+ },
1576
+ {
1577
+ "epoch": 1.63,
1578
+ "learning_rate": 2.2888371123865812e-05,
1579
+ "loss": 0.4072,
1580
+ "step": 114500
1581
+ },
1582
+ {
1583
+ "epoch": 1.63,
1584
+ "learning_rate": 2.276997973139361e-05,
1585
+ "loss": 0.4041,
1586
+ "step": 115000
1587
+ },
1588
+ {
1589
+ "epoch": 1.63,
1590
+ "eval_accuracy": 0.9086945617090499,
1591
+ "eval_loss": 0.37441667914390564,
1592
+ "eval_runtime": 4484.0758,
1593
+ "eval_samples_per_second": 31.441,
1594
+ "eval_steps_per_second": 7.86,
1595
+ "step": 115000
1596
+ },
1597
+ {
1598
+ "epoch": 1.64,
1599
+ "learning_rate": 2.265158833892141e-05,
1600
+ "loss": 0.4044,
1601
+ "step": 115500
1602
+ },
1603
+ {
1604
+ "epoch": 1.65,
1605
+ "learning_rate": 2.2533196946449206e-05,
1606
+ "loss": 0.4038,
1607
+ "step": 116000
1608
+ },
1609
+ {
1610
+ "epoch": 1.66,
1611
+ "learning_rate": 2.2414805553977004e-05,
1612
+ "loss": 0.4003,
1613
+ "step": 116500
1614
+ },
1615
+ {
1616
+ "epoch": 1.66,
1617
+ "learning_rate": 2.2296414161504803e-05,
1618
+ "loss": 0.4024,
1619
+ "step": 117000
1620
+ },
1621
+ {
1622
+ "epoch": 1.67,
1623
+ "learning_rate": 2.2178022769032604e-05,
1624
+ "loss": 0.4032,
1625
+ "step": 117500
1626
+ },
1627
+ {
1628
+ "epoch": 1.68,
1629
+ "learning_rate": 2.20596313765604e-05,
1630
+ "loss": 0.3995,
1631
+ "step": 118000
1632
+ },
1633
+ {
1634
+ "epoch": 1.68,
1635
+ "learning_rate": 2.1941239984088197e-05,
1636
+ "loss": 0.4052,
1637
+ "step": 118500
1638
+ },
1639
+ {
1640
+ "epoch": 1.69,
1641
+ "learning_rate": 2.1822848591615995e-05,
1642
+ "loss": 0.4007,
1643
+ "step": 119000
1644
+ },
1645
+ {
1646
+ "epoch": 1.7,
1647
+ "learning_rate": 2.1704457199143793e-05,
1648
+ "loss": 0.4003,
1649
+ "step": 119500
1650
+ },
1651
+ {
1652
+ "epoch": 1.7,
1653
+ "learning_rate": 2.1586065806671594e-05,
1654
+ "loss": 0.4025,
1655
+ "step": 120000
1656
+ },
1657
+ {
1658
+ "epoch": 1.7,
1659
+ "eval_accuracy": 0.9094200337257503,
1660
+ "eval_loss": 0.37063175439834595,
1661
+ "eval_runtime": 4483.4685,
1662
+ "eval_samples_per_second": 31.446,
1663
+ "eval_steps_per_second": 7.862,
1664
+ "step": 120000
1665
+ },
1666
+ {
1667
+ "epoch": 1.71,
1668
+ "learning_rate": 2.146767441419939e-05,
1669
+ "loss": 0.4016,
1670
+ "step": 120500
1671
+ },
1672
+ {
1673
+ "epoch": 1.72,
1674
+ "learning_rate": 2.134928302172719e-05,
1675
+ "loss": 0.4001,
1676
+ "step": 121000
1677
+ },
1678
+ {
1679
+ "epoch": 1.73,
1680
+ "learning_rate": 2.123089162925499e-05,
1681
+ "loss": 0.4006,
1682
+ "step": 121500
1683
+ },
1684
+ {
1685
+ "epoch": 1.73,
1686
+ "learning_rate": 2.1112500236782787e-05,
1687
+ "loss": 0.3957,
1688
+ "step": 122000
1689
+ },
1690
+ {
1691
+ "epoch": 1.74,
1692
+ "learning_rate": 2.0994108844310585e-05,
1693
+ "loss": 0.4004,
1694
+ "step": 122500
1695
+ },
1696
+ {
1697
+ "epoch": 1.75,
1698
+ "learning_rate": 2.0875717451838383e-05,
1699
+ "loss": 0.3994,
1700
+ "step": 123000
1701
+ },
1702
+ {
1703
+ "epoch": 1.75,
1704
+ "learning_rate": 2.075732605936618e-05,
1705
+ "loss": 0.3951,
1706
+ "step": 123500
1707
+ },
1708
+ {
1709
+ "epoch": 1.76,
1710
+ "learning_rate": 2.063893466689398e-05,
1711
+ "loss": 0.3963,
1712
+ "step": 124000
1713
+ },
1714
+ {
1715
+ "epoch": 1.77,
1716
+ "learning_rate": 2.0520543274421777e-05,
1717
+ "loss": 0.3955,
1718
+ "step": 124500
1719
+ },
1720
+ {
1721
+ "epoch": 1.78,
1722
+ "learning_rate": 2.0402151881949575e-05,
1723
+ "loss": 0.398,
1724
+ "step": 125000
1725
+ },
1726
+ {
1727
+ "epoch": 1.78,
1728
+ "eval_accuracy": 0.9097696447890194,
1729
+ "eval_loss": 0.3691096901893616,
1730
+ "eval_runtime": 4482.3141,
1731
+ "eval_samples_per_second": 31.454,
1732
+ "eval_steps_per_second": 7.864,
1733
+ "step": 125000
1734
+ },
1735
+ {
1736
+ "epoch": 1.78,
1737
+ "learning_rate": 2.0283760489477373e-05,
1738
+ "loss": 0.402,
1739
+ "step": 125500
1740
+ },
1741
+ {
1742
+ "epoch": 1.79,
1743
+ "learning_rate": 2.016536909700517e-05,
1744
+ "loss": 0.393,
1745
+ "step": 126000
1746
+ },
1747
+ {
1748
+ "epoch": 1.8,
1749
+ "learning_rate": 2.004697770453297e-05,
1750
+ "loss": 0.3927,
1751
+ "step": 126500
1752
+ },
1753
+ {
1754
+ "epoch": 1.8,
1755
+ "learning_rate": 1.9928586312060767e-05,
1756
+ "loss": 0.3931,
1757
+ "step": 127000
1758
+ },
1759
+ {
1760
+ "epoch": 1.81,
1761
+ "learning_rate": 1.981019491958857e-05,
1762
+ "loss": 0.3922,
1763
+ "step": 127500
1764
+ },
1765
+ {
1766
+ "epoch": 1.82,
1767
+ "learning_rate": 1.9691803527116364e-05,
1768
+ "loss": 0.3984,
1769
+ "step": 128000
1770
+ },
1771
+ {
1772
+ "epoch": 1.83,
1773
+ "learning_rate": 1.9573412134644165e-05,
1774
+ "loss": 0.3958,
1775
+ "step": 128500
1776
+ },
1777
+ {
1778
+ "epoch": 1.83,
1779
+ "learning_rate": 1.9455020742171963e-05,
1780
+ "loss": 0.3958,
1781
+ "step": 129000
1782
+ },
1783
+ {
1784
+ "epoch": 1.84,
1785
+ "learning_rate": 1.933662934969976e-05,
1786
+ "loss": 0.3971,
1787
+ "step": 129500
1788
+ },
1789
+ {
1790
+ "epoch": 1.85,
1791
+ "learning_rate": 1.921823795722756e-05,
1792
+ "loss": 0.3894,
1793
+ "step": 130000
1794
+ },
1795
+ {
1796
+ "epoch": 1.85,
1797
+ "eval_accuracy": 0.9105847302510419,
1798
+ "eval_loss": 0.3647039532661438,
1799
+ "eval_runtime": 4483.1278,
1800
+ "eval_samples_per_second": 31.448,
1801
+ "eval_steps_per_second": 7.862,
1802
+ "step": 130000
1803
+ },
1804
+ {
1805
+ "epoch": 1.85,
1806
+ "learning_rate": 1.9099846564755357e-05,
1807
+ "loss": 0.3912,
1808
+ "step": 130500
1809
+ },
1810
+ {
1811
+ "epoch": 1.86,
1812
+ "learning_rate": 1.8981455172283155e-05,
1813
+ "loss": 0.3895,
1814
+ "step": 131000
1815
+ },
1816
+ {
1817
+ "epoch": 1.87,
1818
+ "learning_rate": 1.8863063779810953e-05,
1819
+ "loss": 0.3906,
1820
+ "step": 131500
1821
+ },
1822
+ {
1823
+ "epoch": 1.88,
1824
+ "learning_rate": 1.874467238733875e-05,
1825
+ "loss": 0.3893,
1826
+ "step": 132000
1827
+ },
1828
+ {
1829
+ "epoch": 1.88,
1830
+ "learning_rate": 1.862628099486655e-05,
1831
+ "loss": 0.3941,
1832
+ "step": 132500
1833
+ },
1834
+ {
1835
+ "epoch": 1.89,
1836
+ "learning_rate": 1.8507889602394348e-05,
1837
+ "loss": 0.394,
1838
+ "step": 133000
1839
+ },
1840
+ {
1841
+ "epoch": 1.9,
1842
+ "learning_rate": 1.8389498209922146e-05,
1843
+ "loss": 0.3893,
1844
+ "step": 133500
1845
+ },
1846
+ {
1847
+ "epoch": 1.9,
1848
+ "learning_rate": 1.8271106817449947e-05,
1849
+ "loss": 0.3911,
1850
+ "step": 134000
1851
+ },
1852
+ {
1853
+ "epoch": 1.91,
1854
+ "learning_rate": 1.8152715424977742e-05,
1855
+ "loss": 0.3899,
1856
+ "step": 134500
1857
+ },
1858
+ {
1859
+ "epoch": 1.92,
1860
+ "learning_rate": 1.8034324032505543e-05,
1861
+ "loss": 0.3898,
1862
+ "step": 135000
1863
+ },
1864
+ {
1865
+ "epoch": 1.92,
1866
+ "eval_accuracy": 0.9108666750444402,
1867
+ "eval_loss": 0.36286085844039917,
1868
+ "eval_runtime": 4484.3794,
1869
+ "eval_samples_per_second": 31.439,
1870
+ "eval_steps_per_second": 7.86,
1871
+ "step": 135000
1872
+ },
1873
+ {
1874
+ "epoch": 1.93,
1875
+ "learning_rate": 1.7915932640033338e-05,
1876
+ "loss": 0.389,
1877
+ "step": 135500
1878
+ },
1879
+ {
1880
+ "epoch": 1.93,
1881
+ "learning_rate": 1.779754124756114e-05,
1882
+ "loss": 0.3878,
1883
+ "step": 136000
1884
+ },
1885
+ {
1886
+ "epoch": 1.94,
1887
+ "learning_rate": 1.7679149855088938e-05,
1888
+ "loss": 0.3926,
1889
+ "step": 136500
1890
+ },
1891
+ {
1892
+ "epoch": 1.95,
1893
+ "learning_rate": 1.7560758462616736e-05,
1894
+ "loss": 0.3916,
1895
+ "step": 137000
1896
+ },
1897
+ {
1898
+ "epoch": 1.95,
1899
+ "learning_rate": 1.7442367070144534e-05,
1900
+ "loss": 0.3903,
1901
+ "step": 137500
1902
+ },
1903
+ {
1904
+ "epoch": 1.96,
1905
+ "learning_rate": 1.732397567767233e-05,
1906
+ "loss": 0.3932,
1907
+ "step": 138000
1908
+ },
1909
+ {
1910
+ "epoch": 1.97,
1911
+ "learning_rate": 1.720558428520013e-05,
1912
+ "loss": 0.3894,
1913
+ "step": 138500
1914
+ },
1915
+ {
1916
+ "epoch": 1.97,
1917
+ "learning_rate": 1.7087192892727928e-05,
1918
+ "loss": 0.3882,
1919
+ "step": 139000
1920
+ },
1921
+ {
1922
+ "epoch": 1.98,
1923
+ "learning_rate": 1.6968801500255726e-05,
1924
+ "loss": 0.3874,
1925
+ "step": 139500
1926
+ },
1927
+ {
1928
+ "epoch": 1.99,
1929
+ "learning_rate": 1.6850410107783524e-05,
1930
+ "loss": 0.3857,
1931
+ "step": 140000
1932
+ },
1933
+ {
1934
+ "epoch": 1.99,
1935
+ "eval_accuracy": 0.9115911874403287,
1936
+ "eval_loss": 0.36069589853286743,
1937
+ "eval_runtime": 4496.781,
1938
+ "eval_samples_per_second": 31.353,
1939
+ "eval_steps_per_second": 7.838,
1940
+ "step": 140000
1941
+ },
1942
+ {
1943
+ "epoch": 2.0,
1944
+ "learning_rate": 1.6732018715311322e-05,
1945
+ "loss": 0.3895,
1946
+ "step": 140500
1947
+ },
1948
+ {
1949
+ "epoch": 2.0,
1950
+ "learning_rate": 1.661362732283912e-05,
1951
+ "loss": 0.3903,
1952
+ "step": 141000
1953
+ },
1954
+ {
1955
+ "epoch": 2.01,
1956
+ "learning_rate": 1.6495235930366922e-05,
1957
+ "loss": 0.3858,
1958
+ "step": 141500
1959
+ },
1960
+ {
1961
+ "epoch": 2.02,
1962
+ "learning_rate": 1.6376844537894716e-05,
1963
+ "loss": 0.3838,
1964
+ "step": 142000
1965
+ },
1966
+ {
1967
+ "epoch": 2.02,
1968
+ "learning_rate": 1.6258453145422518e-05,
1969
+ "loss": 0.3811,
1970
+ "step": 142500
1971
+ },
1972
+ {
1973
+ "epoch": 2.03,
1974
+ "learning_rate": 1.6140061752950313e-05,
1975
+ "loss": 0.3892,
1976
+ "step": 143000
1977
+ },
1978
+ {
1979
+ "epoch": 2.04,
1980
+ "learning_rate": 1.6021670360478114e-05,
1981
+ "loss": 0.3857,
1982
+ "step": 143500
1983
+ },
1984
+ {
1985
+ "epoch": 2.05,
1986
+ "learning_rate": 1.5903278968005912e-05,
1987
+ "loss": 0.3812,
1988
+ "step": 144000
1989
+ },
1990
+ {
1991
+ "epoch": 2.05,
1992
+ "learning_rate": 1.5784887575533707e-05,
1993
+ "loss": 0.3862,
1994
+ "step": 144500
1995
+ },
1996
+ {
1997
+ "epoch": 2.06,
1998
+ "learning_rate": 1.5666496183061508e-05,
1999
+ "loss": 0.3809,
2000
+ "step": 145000
2001
+ },
2002
+ {
2003
+ "epoch": 2.06,
2004
+ "eval_accuracy": 0.9121499748273744,
2005
+ "eval_loss": 0.3579576909542084,
2006
+ "eval_runtime": 4496.0613,
2007
+ "eval_samples_per_second": 31.358,
2008
+ "eval_steps_per_second": 7.84,
2009
+ "step": 145000
2010
+ },
2011
+ {
2012
+ "epoch": 2.07,
2013
+ "learning_rate": 1.5548104790589303e-05,
2014
+ "loss": 0.3831,
2015
+ "step": 145500
2016
+ },
2017
+ {
2018
+ "epoch": 2.07,
2019
+ "learning_rate": 1.5429713398117104e-05,
2020
+ "loss": 0.3831,
2021
+ "step": 146000
2022
+ },
2023
+ {
2024
+ "epoch": 2.08,
2025
+ "learning_rate": 1.5311322005644902e-05,
2026
+ "loss": 0.3822,
2027
+ "step": 146500
2028
+ },
2029
+ {
2030
+ "epoch": 2.09,
2031
+ "learning_rate": 1.51929306131727e-05,
2032
+ "loss": 0.3817,
2033
+ "step": 147000
2034
+ },
2035
+ {
2036
+ "epoch": 2.1,
2037
+ "learning_rate": 1.5074539220700499e-05,
2038
+ "loss": 0.3854,
2039
+ "step": 147500
2040
+ },
2041
+ {
2042
+ "epoch": 2.1,
2043
+ "learning_rate": 1.4956147828228298e-05,
2044
+ "loss": 0.3851,
2045
+ "step": 148000
2046
+ },
2047
+ {
2048
+ "epoch": 2.11,
2049
+ "learning_rate": 1.4837756435756095e-05,
2050
+ "loss": 0.3838,
2051
+ "step": 148500
2052
+ },
2053
+ {
2054
+ "epoch": 2.12,
2055
+ "learning_rate": 1.4719365043283895e-05,
2056
+ "loss": 0.3812,
2057
+ "step": 149000
2058
+ },
2059
+ {
2060
+ "epoch": 2.12,
2061
+ "learning_rate": 1.4600973650811693e-05,
2062
+ "loss": 0.3825,
2063
+ "step": 149500
2064
+ },
2065
+ {
2066
+ "epoch": 2.13,
2067
+ "learning_rate": 1.4482582258339492e-05,
2068
+ "loss": 0.3819,
2069
+ "step": 150000
2070
+ },
2071
+ {
2072
+ "epoch": 2.13,
2073
+ "eval_accuracy": 0.91259126809248,
2074
+ "eval_loss": 0.3549927771091461,
2075
+ "eval_runtime": 4493.1359,
2076
+ "eval_samples_per_second": 31.378,
2077
+ "eval_steps_per_second": 7.845,
2078
+ "step": 150000
2079
+ },
2080
+ {
2081
+ "epoch": 2.14,
2082
+ "learning_rate": 1.4364190865867289e-05,
2083
+ "loss": 0.3793,
2084
+ "step": 150500
2085
+ },
2086
+ {
2087
+ "epoch": 2.15,
2088
+ "learning_rate": 1.4245799473395085e-05,
2089
+ "loss": 0.3832,
2090
+ "step": 151000
2091
+ },
2092
+ {
2093
+ "epoch": 2.15,
2094
+ "learning_rate": 1.4127408080922885e-05,
2095
+ "loss": 0.3806,
2096
+ "step": 151500
2097
+ },
2098
+ {
2099
+ "epoch": 2.16,
2100
+ "learning_rate": 1.4009016688450683e-05,
2101
+ "loss": 0.3784,
2102
+ "step": 152000
2103
+ },
2104
+ {
2105
+ "epoch": 2.17,
2106
+ "learning_rate": 1.3890625295978483e-05,
2107
+ "loss": 0.382,
2108
+ "step": 152500
2109
+ },
2110
+ {
2111
+ "epoch": 2.17,
2112
+ "learning_rate": 1.3772233903506279e-05,
2113
+ "loss": 0.3788,
2114
+ "step": 153000
2115
+ },
2116
+ {
2117
+ "epoch": 2.18,
2118
+ "learning_rate": 1.3653842511034079e-05,
2119
+ "loss": 0.3837,
2120
+ "step": 153500
2121
+ },
2122
+ {
2123
+ "epoch": 2.19,
2124
+ "learning_rate": 1.3535451118561875e-05,
2125
+ "loss": 0.3843,
2126
+ "step": 154000
2127
+ },
2128
+ {
2129
+ "epoch": 2.19,
2130
+ "learning_rate": 1.3417059726089675e-05,
2131
+ "loss": 0.3813,
2132
+ "step": 154500
2133
+ },
2134
+ {
2135
+ "epoch": 2.2,
2136
+ "learning_rate": 1.3298668333617473e-05,
2137
+ "loss": 0.3795,
2138
+ "step": 155000
2139
+ },
2140
+ {
2141
+ "epoch": 2.2,
2142
+ "eval_accuracy": 0.913045165928687,
2143
+ "eval_loss": 0.3533931076526642,
2144
+ "eval_runtime": 4479.8171,
2145
+ "eval_samples_per_second": 31.471,
2146
+ "eval_steps_per_second": 7.868,
2147
+ "step": 155000
2148
+ },
2149
+ {
2150
+ "epoch": 2.21,
2151
+ "learning_rate": 1.3180276941145273e-05,
2152
+ "loss": 0.3806,
2153
+ "step": 155500
2154
+ },
2155
+ {
2156
+ "epoch": 2.22,
2157
+ "learning_rate": 1.306188554867307e-05,
2158
+ "loss": 0.3763,
2159
+ "step": 156000
2160
+ },
2161
+ {
2162
+ "epoch": 2.22,
2163
+ "learning_rate": 1.2943494156200869e-05,
2164
+ "loss": 0.371,
2165
+ "step": 156500
2166
+ },
2167
+ {
2168
+ "epoch": 2.23,
2169
+ "learning_rate": 1.2825102763728667e-05,
2170
+ "loss": 0.3809,
2171
+ "step": 157000
2172
+ },
2173
+ {
2174
+ "epoch": 2.24,
2175
+ "learning_rate": 1.2706711371256467e-05,
2176
+ "loss": 0.3775,
2177
+ "step": 157500
2178
+ },
2179
+ {
2180
+ "epoch": 2.24,
2181
+ "learning_rate": 1.2588319978784263e-05,
2182
+ "loss": 0.3813,
2183
+ "step": 158000
2184
+ },
2185
+ {
2186
+ "epoch": 2.25,
2187
+ "learning_rate": 1.2469928586312061e-05,
2188
+ "loss": 0.3808,
2189
+ "step": 158500
2190
+ },
2191
+ {
2192
+ "epoch": 2.26,
2193
+ "learning_rate": 1.235153719383986e-05,
2194
+ "loss": 0.3771,
2195
+ "step": 159000
2196
+ },
2197
+ {
2198
+ "epoch": 2.27,
2199
+ "learning_rate": 1.2233145801367659e-05,
2200
+ "loss": 0.3734,
2201
+ "step": 159500
2202
+ },
2203
+ {
2204
+ "epoch": 2.27,
2205
+ "learning_rate": 1.2114754408895457e-05,
2206
+ "loss": 0.3772,
2207
+ "step": 160000
2208
+ },
2209
+ {
2210
+ "epoch": 2.27,
2211
+ "eval_accuracy": 0.9134737242143287,
2212
+ "eval_loss": 0.3512367606163025,
2213
+ "eval_runtime": 4479.3521,
2214
+ "eval_samples_per_second": 31.475,
2215
+ "eval_steps_per_second": 7.869,
2216
+ "step": 160000
2217
+ },
2218
+ {
2219
+ "epoch": 2.28,
2220
+ "learning_rate": 1.1996363016423255e-05,
2221
+ "loss": 0.3765,
2222
+ "step": 160500
2223
+ },
2224
+ {
2225
+ "epoch": 2.29,
2226
+ "learning_rate": 1.1877971623951052e-05,
2227
+ "loss": 0.3771,
2228
+ "step": 161000
2229
+ },
2230
+ {
2231
+ "epoch": 2.29,
2232
+ "learning_rate": 1.175958023147885e-05,
2233
+ "loss": 0.3757,
2234
+ "step": 161500
2235
+ },
2236
+ {
2237
+ "epoch": 2.3,
2238
+ "learning_rate": 1.164118883900665e-05,
2239
+ "loss": 0.3742,
2240
+ "step": 162000
2241
+ },
2242
+ {
2243
+ "epoch": 2.31,
2244
+ "learning_rate": 1.1522797446534448e-05,
2245
+ "loss": 0.3738,
2246
+ "step": 162500
2247
+ },
2248
+ {
2249
+ "epoch": 2.32,
2250
+ "learning_rate": 1.1404406054062246e-05,
2251
+ "loss": 0.3743,
2252
+ "step": 163000
2253
+ },
2254
+ {
2255
+ "epoch": 2.32,
2256
+ "learning_rate": 1.1286014661590044e-05,
2257
+ "loss": 0.3738,
2258
+ "step": 163500
2259
+ },
2260
+ {
2261
+ "epoch": 2.33,
2262
+ "learning_rate": 1.1167623269117842e-05,
2263
+ "loss": 0.3712,
2264
+ "step": 164000
2265
+ },
2266
+ {
2267
+ "epoch": 2.34,
2268
+ "learning_rate": 1.1049231876645642e-05,
2269
+ "loss": 0.373,
2270
+ "step": 164500
2271
+ },
2272
+ {
2273
+ "epoch": 2.34,
2274
+ "learning_rate": 1.093084048417344e-05,
2275
+ "loss": 0.3723,
2276
+ "step": 165000
2277
+ },
2278
+ {
2279
+ "epoch": 2.34,
2280
+ "eval_accuracy": 0.9138787356297591,
2281
+ "eval_loss": 0.3488907516002655,
2282
+ "eval_runtime": 4481.5458,
2283
+ "eval_samples_per_second": 31.459,
2284
+ "eval_steps_per_second": 7.865,
2285
+ "step": 165000
2286
+ },
2287
+ {
2288
+ "epoch": 2.35,
2289
+ "learning_rate": 1.0812449091701238e-05,
2290
+ "loss": 0.373,
2291
+ "step": 165500
2292
+ },
2293
+ {
2294
+ "epoch": 2.36,
2295
+ "learning_rate": 1.0694057699229036e-05,
2296
+ "loss": 0.3715,
2297
+ "step": 166000
2298
+ },
2299
+ {
2300
+ "epoch": 2.37,
2301
+ "learning_rate": 1.0575666306756834e-05,
2302
+ "loss": 0.3753,
2303
+ "step": 166500
2304
+ },
2305
+ {
2306
+ "epoch": 2.37,
2307
+ "learning_rate": 1.0457274914284634e-05,
2308
+ "loss": 0.3755,
2309
+ "step": 167000
2310
+ },
2311
+ {
2312
+ "epoch": 2.38,
2313
+ "learning_rate": 1.033888352181243e-05,
2314
+ "loss": 0.3736,
2315
+ "step": 167500
2316
+ },
2317
+ {
2318
+ "epoch": 2.39,
2319
+ "learning_rate": 1.0220492129340228e-05,
2320
+ "loss": 0.3738,
2321
+ "step": 168000
2322
+ },
2323
+ {
2324
+ "epoch": 2.39,
2325
+ "learning_rate": 1.0102100736868026e-05,
2326
+ "loss": 0.3718,
2327
+ "step": 168500
2328
+ },
2329
+ {
2330
+ "epoch": 2.4,
2331
+ "learning_rate": 9.983709344395826e-06,
2332
+ "loss": 0.3701,
2333
+ "step": 169000
2334
+ },
2335
+ {
2336
+ "epoch": 2.41,
2337
+ "learning_rate": 9.865317951923624e-06,
2338
+ "loss": 0.3747,
2339
+ "step": 169500
2340
+ },
2341
+ {
2342
+ "epoch": 2.42,
2343
+ "learning_rate": 9.746926559451422e-06,
2344
+ "loss": 0.3741,
2345
+ "step": 170000
2346
+ },
2347
+ {
2348
+ "epoch": 2.42,
2349
+ "eval_accuracy": 0.9141958812185558,
2350
+ "eval_loss": 0.34678730368614197,
2351
+ "eval_runtime": 4483.3675,
2352
+ "eval_samples_per_second": 31.446,
2353
+ "eval_steps_per_second": 7.862,
2354
+ "step": 170000
2355
+ },
2356
+ {
2357
+ "epoch": 2.42,
2358
+ "learning_rate": 9.62853516697922e-06,
2359
+ "loss": 0.3731,
2360
+ "step": 170500
2361
+ },
2362
+ {
2363
+ "epoch": 2.43,
2364
+ "learning_rate": 9.510143774507018e-06,
2365
+ "loss": 0.3709,
2366
+ "step": 171000
2367
+ },
2368
+ {
2369
+ "epoch": 2.44,
2370
+ "learning_rate": 9.391752382034818e-06,
2371
+ "loss": 0.3674,
2372
+ "step": 171500
2373
+ },
2374
+ {
2375
+ "epoch": 2.44,
2376
+ "learning_rate": 9.273360989562616e-06,
2377
+ "loss": 0.3688,
2378
+ "step": 172000
2379
+ },
2380
+ {
2381
+ "epoch": 2.45,
2382
+ "learning_rate": 9.154969597090414e-06,
2383
+ "loss": 0.3708,
2384
+ "step": 172500
2385
+ },
2386
+ {
2387
+ "epoch": 2.46,
2388
+ "learning_rate": 9.036578204618212e-06,
2389
+ "loss": 0.3705,
2390
+ "step": 173000
2391
+ },
2392
+ {
2393
+ "epoch": 2.46,
2394
+ "learning_rate": 8.91818681214601e-06,
2395
+ "loss": 0.3688,
2396
+ "step": 173500
2397
+ },
2398
+ {
2399
+ "epoch": 2.47,
2400
+ "learning_rate": 8.799795419673808e-06,
2401
+ "loss": 0.3688,
2402
+ "step": 174000
2403
+ },
2404
+ {
2405
+ "epoch": 2.48,
2406
+ "learning_rate": 8.681404027201606e-06,
2407
+ "loss": 0.3731,
2408
+ "step": 174500
2409
+ },
2410
+ {
2411
+ "epoch": 2.49,
2412
+ "learning_rate": 8.563012634729405e-06,
2413
+ "loss": 0.3678,
2414
+ "step": 175000
2415
+ },
2416
+ {
2417
+ "epoch": 2.49,
2418
+ "eval_accuracy": 0.9147443492541378,
2419
+ "eval_loss": 0.3443525731563568,
2420
+ "eval_runtime": 4487.0096,
2421
+ "eval_samples_per_second": 31.421,
2422
+ "eval_steps_per_second": 7.855,
2423
+ "step": 175000
2424
+ },
2425
+ {
2426
+ "epoch": 2.49,
2427
+ "learning_rate": 8.444621242257203e-06,
2428
+ "loss": 0.3712,
2429
+ "step": 175500
2430
+ },
2431
+ {
2432
+ "epoch": 2.5,
2433
+ "learning_rate": 8.326229849785e-06,
2434
+ "loss": 0.3707,
2435
+ "step": 176000
2436
+ },
2437
+ {
2438
+ "epoch": 2.51,
2439
+ "learning_rate": 8.2078384573128e-06,
2440
+ "loss": 0.3693,
2441
+ "step": 176500
2442
+ },
2443
+ {
2444
+ "epoch": 2.51,
2445
+ "learning_rate": 8.089447064840599e-06,
2446
+ "loss": 0.371,
2447
+ "step": 177000
2448
+ },
2449
+ {
2450
+ "epoch": 2.52,
2451
+ "learning_rate": 7.971055672368397e-06,
2452
+ "loss": 0.371,
2453
+ "step": 177500
2454
+ },
2455
+ {
2456
+ "epoch": 2.53,
2457
+ "learning_rate": 7.852664279896195e-06,
2458
+ "loss": 0.3724,
2459
+ "step": 178000
2460
+ },
2461
+ {
2462
+ "epoch": 2.54,
2463
+ "learning_rate": 7.734272887423993e-06,
2464
+ "loss": 0.3707,
2465
+ "step": 178500
2466
+ },
2467
+ {
2468
+ "epoch": 2.54,
2469
+ "learning_rate": 7.615881494951792e-06,
2470
+ "loss": 0.3673,
2471
+ "step": 179000
2472
+ },
2473
+ {
2474
+ "epoch": 2.55,
2475
+ "learning_rate": 7.4974901024795906e-06,
2476
+ "loss": 0.3684,
2477
+ "step": 179500
2478
+ },
2479
+ {
2480
+ "epoch": 2.56,
2481
+ "learning_rate": 7.379098710007389e-06,
2482
+ "loss": 0.3698,
2483
+ "step": 180000
2484
+ },
2485
+ {
2486
+ "epoch": 2.56,
2487
+ "eval_accuracy": 0.9149897136925411,
2488
+ "eval_loss": 0.3437068462371826,
2489
+ "eval_runtime": 4486.3551,
2490
+ "eval_samples_per_second": 31.426,
2491
+ "eval_steps_per_second": 7.856,
2492
+ "step": 180000
2493
+ },
2494
+ {
2495
+ "epoch": 2.56,
2496
+ "learning_rate": 7.260707317535187e-06,
2497
+ "loss": 0.3644,
2498
+ "step": 180500
2499
+ },
2500
+ {
2501
+ "epoch": 2.57,
2502
+ "learning_rate": 7.142315925062984e-06,
2503
+ "loss": 0.3696,
2504
+ "step": 181000
2505
+ },
2506
+ {
2507
+ "epoch": 2.58,
2508
+ "learning_rate": 7.023924532590782e-06,
2509
+ "loss": 0.3682,
2510
+ "step": 181500
2511
+ },
2512
+ {
2513
+ "epoch": 2.59,
2514
+ "learning_rate": 6.905533140118581e-06,
2515
+ "loss": 0.3703,
2516
+ "step": 182000
2517
+ },
2518
+ {
2519
+ "epoch": 2.59,
2520
+ "learning_rate": 6.787141747646379e-06,
2521
+ "loss": 0.3686,
2522
+ "step": 182500
2523
+ },
2524
+ {
2525
+ "epoch": 2.6,
2526
+ "learning_rate": 6.668750355174178e-06,
2527
+ "loss": 0.3638,
2528
+ "step": 183000
2529
+ },
2530
+ {
2531
+ "epoch": 2.61,
2532
+ "learning_rate": 6.550358962701976e-06,
2533
+ "loss": 0.3652,
2534
+ "step": 183500
2535
+ },
2536
+ {
2537
+ "epoch": 2.61,
2538
+ "learning_rate": 6.431967570229774e-06,
2539
+ "loss": 0.367,
2540
+ "step": 184000
2541
+ },
2542
+ {
2543
+ "epoch": 2.62,
2544
+ "learning_rate": 6.313576177757573e-06,
2545
+ "loss": 0.3665,
2546
+ "step": 184500
2547
+ },
2548
+ {
2549
+ "epoch": 2.63,
2550
+ "learning_rate": 6.195184785285371e-06,
2551
+ "loss": 0.3685,
2552
+ "step": 185000
2553
+ },
2554
+ {
2555
+ "epoch": 2.63,
2556
+ "eval_accuracy": 0.9152310790161654,
2557
+ "eval_loss": 0.34259435534477234,
2558
+ "eval_runtime": 4485.6065,
2559
+ "eval_samples_per_second": 31.431,
2560
+ "eval_steps_per_second": 7.858,
2561
+ "step": 185000
2562
+ },
2563
+ {
2564
+ "epoch": 2.64,
2565
+ "learning_rate": 6.076793392813169e-06,
2566
+ "loss": 0.3635,
2567
+ "step": 185500
2568
+ },
2569
+ {
2570
+ "epoch": 2.64,
2571
+ "learning_rate": 5.958402000340967e-06,
2572
+ "loss": 0.3645,
2573
+ "step": 186000
2574
+ },
2575
+ {
2576
+ "epoch": 2.65,
2577
+ "learning_rate": 5.840010607868765e-06,
2578
+ "loss": 0.3697,
2579
+ "step": 186500
2580
+ },
2581
+ {
2582
+ "epoch": 2.66,
2583
+ "learning_rate": 5.721619215396564e-06,
2584
+ "loss": 0.3693,
2585
+ "step": 187000
2586
+ },
2587
+ {
2588
+ "epoch": 2.66,
2589
+ "learning_rate": 5.603227822924362e-06,
2590
+ "loss": 0.3638,
2591
+ "step": 187500
2592
+ },
2593
+ {
2594
+ "epoch": 2.67,
2595
+ "learning_rate": 5.484836430452161e-06,
2596
+ "loss": 0.3621,
2597
+ "step": 188000
2598
+ },
2599
+ {
2600
+ "epoch": 2.68,
2601
+ "learning_rate": 5.366445037979959e-06,
2602
+ "loss": 0.3602,
2603
+ "step": 188500
2604
+ },
2605
+ {
2606
+ "epoch": 2.69,
2607
+ "learning_rate": 5.248053645507757e-06,
2608
+ "loss": 0.369,
2609
+ "step": 189000
2610
+ },
2611
+ {
2612
+ "epoch": 2.69,
2613
+ "learning_rate": 5.1296622530355554e-06,
2614
+ "loss": 0.3678,
2615
+ "step": 189500
2616
+ },
2617
+ {
2618
+ "epoch": 2.7,
2619
+ "learning_rate": 5.0112708605633535e-06,
2620
+ "loss": 0.3663,
2621
+ "step": 190000
2622
+ },
2623
+ {
2624
+ "epoch": 2.7,
2625
+ "eval_accuracy": 0.9157103027094395,
2626
+ "eval_loss": 0.3402460515499115,
2627
+ "eval_runtime": 4483.4518,
2628
+ "eval_samples_per_second": 31.446,
2629
+ "eval_steps_per_second": 7.862,
2630
+ "step": 190000
2631
+ },
2632
+ {
2633
+ "epoch": 2.71,
2634
+ "learning_rate": 4.8928794680911524e-06,
2635
+ "loss": 0.366,
2636
+ "step": 190500
2637
+ },
2638
+ {
2639
+ "epoch": 2.71,
2640
+ "learning_rate": 4.7744880756189505e-06,
2641
+ "loss": 0.3676,
2642
+ "step": 191000
2643
+ },
2644
+ {
2645
+ "epoch": 2.72,
2646
+ "learning_rate": 4.656096683146749e-06,
2647
+ "loss": 0.3653,
2648
+ "step": 191500
2649
+ },
2650
+ {
2651
+ "epoch": 2.73,
2652
+ "learning_rate": 4.5377052906745475e-06,
2653
+ "loss": 0.3649,
2654
+ "step": 192000
2655
+ },
2656
+ {
2657
+ "epoch": 2.73,
2658
+ "learning_rate": 4.419313898202345e-06,
2659
+ "loss": 0.3626,
2660
+ "step": 192500
2661
+ },
2662
+ {
2663
+ "epoch": 2.74,
2664
+ "learning_rate": 4.300922505730144e-06,
2665
+ "loss": 0.3603,
2666
+ "step": 193000
2667
+ },
2668
+ {
2669
+ "epoch": 2.75,
2670
+ "learning_rate": 4.182531113257942e-06,
2671
+ "loss": 0.3597,
2672
+ "step": 193500
2673
+ },
2674
+ {
2675
+ "epoch": 2.76,
2676
+ "learning_rate": 4.064139720785741e-06,
2677
+ "loss": 0.3666,
2678
+ "step": 194000
2679
+ },
2680
+ {
2681
+ "epoch": 2.76,
2682
+ "learning_rate": 3.945748328313539e-06,
2683
+ "loss": 0.3626,
2684
+ "step": 194500
2685
+ },
2686
+ {
2687
+ "epoch": 2.77,
2688
+ "learning_rate": 3.827356935841337e-06,
2689
+ "loss": 0.3636,
2690
+ "step": 195000
2691
+ },
2692
+ {
2693
+ "epoch": 2.77,
2694
+ "eval_accuracy": 0.9157984246825175,
2695
+ "eval_loss": 0.33979707956314087,
2696
+ "eval_runtime": 4482.6546,
2697
+ "eval_samples_per_second": 31.451,
2698
+ "eval_steps_per_second": 7.863,
2699
+ "step": 195000
2700
+ },
2701
+ {
2702
+ "epoch": 2.78,
2703
+ "learning_rate": 3.708965543369135e-06,
2704
+ "loss": 0.3641,
2705
+ "step": 195500
2706
+ },
2707
+ {
2708
+ "epoch": 2.78,
2709
+ "learning_rate": 3.5905741508969334e-06,
2710
+ "loss": 0.3637,
2711
+ "step": 196000
2712
+ },
2713
+ {
2714
+ "epoch": 2.79,
2715
+ "learning_rate": 3.4721827584247314e-06,
2716
+ "loss": 0.3646,
2717
+ "step": 196500
2718
+ },
2719
+ {
2720
+ "epoch": 2.8,
2721
+ "learning_rate": 3.35379136595253e-06,
2722
+ "loss": 0.3603,
2723
+ "step": 197000
2724
+ },
2725
+ {
2726
+ "epoch": 2.81,
2727
+ "learning_rate": 3.2353999734803284e-06,
2728
+ "loss": 0.3663,
2729
+ "step": 197500
2730
+ },
2731
+ {
2732
+ "epoch": 2.81,
2733
+ "learning_rate": 3.1170085810081265e-06,
2734
+ "loss": 0.3667,
2735
+ "step": 198000
2736
+ },
2737
+ {
2738
+ "epoch": 2.82,
2739
+ "learning_rate": 2.998617188535925e-06,
2740
+ "loss": 0.3638,
2741
+ "step": 198500
2742
+ },
2743
+ {
2744
+ "epoch": 2.83,
2745
+ "learning_rate": 2.880225796063723e-06,
2746
+ "loss": 0.3597,
2747
+ "step": 199000
2748
+ },
2749
+ {
2750
+ "epoch": 2.83,
2751
+ "learning_rate": 2.761834403591521e-06,
2752
+ "loss": 0.3639,
2753
+ "step": 199500
2754
+ },
2755
+ {
2756
+ "epoch": 2.84,
2757
+ "learning_rate": 2.6434430111193197e-06,
2758
+ "loss": 0.3618,
2759
+ "step": 200000
2760
+ },
2761
+ {
2762
+ "epoch": 2.84,
2763
+ "eval_accuracy": 0.9161026342896044,
2764
+ "eval_loss": 0.33811721205711365,
2765
+ "eval_runtime": 4483.081,
2766
+ "eval_samples_per_second": 31.448,
2767
+ "eval_steps_per_second": 7.862,
2768
+ "step": 200000
2769
+ },
2770
+ {
2771
+ "epoch": 2.85,
2772
+ "learning_rate": 2.525051618647118e-06,
2773
+ "loss": 0.3648,
2774
+ "step": 200500
2775
+ },
2776
+ {
2777
+ "epoch": 2.86,
2778
+ "learning_rate": 2.4066602261749166e-06,
2779
+ "loss": 0.3614,
2780
+ "step": 201000
2781
+ },
2782
+ {
2783
+ "epoch": 2.86,
2784
+ "learning_rate": 2.2882688337027147e-06,
2785
+ "loss": 0.3607,
2786
+ "step": 201500
2787
+ },
2788
+ {
2789
+ "epoch": 2.87,
2790
+ "learning_rate": 2.169877441230513e-06,
2791
+ "loss": 0.3638,
2792
+ "step": 202000
2793
+ },
2794
+ {
2795
+ "epoch": 2.88,
2796
+ "learning_rate": 2.0514860487583113e-06,
2797
+ "loss": 0.3634,
2798
+ "step": 202500
2799
+ },
2800
+ {
2801
+ "epoch": 2.88,
2802
+ "learning_rate": 1.9330946562861094e-06,
2803
+ "loss": 0.3628,
2804
+ "step": 203000
2805
+ },
2806
+ {
2807
+ "epoch": 2.89,
2808
+ "learning_rate": 1.8147032638139079e-06,
2809
+ "loss": 0.358,
2810
+ "step": 203500
2811
+ },
2812
+ {
2813
+ "epoch": 2.9,
2814
+ "learning_rate": 1.6963118713417062e-06,
2815
+ "loss": 0.3586,
2816
+ "step": 204000
2817
+ },
2818
+ {
2819
+ "epoch": 2.91,
2820
+ "learning_rate": 1.5779204788695042e-06,
2821
+ "loss": 0.364,
2822
+ "step": 204500
2823
+ },
2824
+ {
2825
+ "epoch": 2.91,
2826
+ "learning_rate": 1.4595290863973025e-06,
2827
+ "loss": 0.3613,
2828
+ "step": 205000
2829
+ },
2830
+ {
2831
+ "epoch": 2.91,
2832
+ "eval_accuracy": 0.9164505308425883,
2833
+ "eval_loss": 0.3365662097930908,
2834
+ "eval_runtime": 4483.3011,
2835
+ "eval_samples_per_second": 31.447,
2836
+ "eval_steps_per_second": 7.862,
2837
+ "step": 205000
2838
+ },
2839
+ {
2840
+ "epoch": 2.92,
2841
+ "learning_rate": 1.341137693925101e-06,
2842
+ "loss": 0.3641,
2843
+ "step": 205500
2844
+ },
2845
+ {
2846
+ "epoch": 2.93,
2847
+ "learning_rate": 1.2227463014528993e-06,
2848
+ "loss": 0.363,
2849
+ "step": 206000
2850
+ },
2851
+ {
2852
+ "epoch": 2.93,
2853
+ "learning_rate": 1.1043549089806974e-06,
2854
+ "loss": 0.3576,
2855
+ "step": 206500
2856
+ },
2857
+ {
2858
+ "epoch": 2.94,
2859
+ "learning_rate": 9.859635165084959e-07,
2860
+ "loss": 0.3627,
2861
+ "step": 207000
2862
+ },
2863
+ {
2864
+ "epoch": 2.95,
2865
+ "learning_rate": 8.67572124036294e-07,
2866
+ "loss": 0.3604,
2867
+ "step": 207500
2868
+ },
2869
+ {
2870
+ "epoch": 2.96,
2871
+ "learning_rate": 7.491807315640924e-07,
2872
+ "loss": 0.3614,
2873
+ "step": 208000
2874
+ },
2875
+ {
2876
+ "epoch": 2.96,
2877
+ "learning_rate": 6.307893390918907e-07,
2878
+ "loss": 0.3584,
2879
+ "step": 208500
2880
+ },
2881
+ {
2882
+ "epoch": 2.97,
2883
+ "learning_rate": 5.12397946619689e-07,
2884
+ "loss": 0.3594,
2885
+ "step": 209000
2886
+ },
2887
+ {
2888
+ "epoch": 2.98,
2889
+ "learning_rate": 3.9400655414748724e-07,
2890
+ "loss": 0.3589,
2891
+ "step": 209500
2892
+ },
2893
+ {
2894
+ "epoch": 2.98,
2895
+ "learning_rate": 2.756151616752856e-07,
2896
+ "loss": 0.3618,
2897
+ "step": 210000
2898
+ },
2899
+ {
2900
+ "epoch": 2.98,
2901
+ "eval_accuracy": 0.9162037471165208,
2902
+ "eval_loss": 0.3379279673099518,
2903
+ "eval_runtime": 4487.2562,
2904
+ "eval_samples_per_second": 31.419,
2905
+ "eval_steps_per_second": 7.855,
2906
+ "step": 210000
2907
+ },
2908
+ {
2909
+ "epoch": 2.99,
2910
+ "learning_rate": 1.5722376920308387e-07,
2911
+ "loss": 0.3621,
2912
+ "step": 210500
2913
+ },
2914
+ {
2915
+ "epoch": 3.0,
2916
+ "learning_rate": 3.883237673088216e-08,
2917
+ "loss": 0.3606,
2918
+ "step": 211000
2919
+ },
2920
+ {
2921
+ "epoch": 3.0,
2922
+ "step": 211164,
2923
+ "total_flos": 8.915475817785139e+17,
2924
+ "train_loss": 0.4472081179925129,
2925
+ "train_runtime": 493604.9481,
2926
+ "train_samples_per_second": 6.845,
2927
+ "train_steps_per_second": 0.428
2928
+ }
2929
+ ],
2930
+ "max_steps": 211164,
2931
+ "num_train_epochs": 3,
2932
+ "total_flos": 8.915475817785139e+17,
2933
+ "trial_name": null,
2934
+ "trial_params": null
2935
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b43987fb3f0ebcf5e0d1191bfdf8a14e8a4a38f3b4782a2bbf593c2d1b2eeab2
3
+ size 3899