ZaandaTeika commited on
Commit
aee3614
·
verified ·
1 Parent(s): bf71d6d

Convert model to bfloat16 and fix total_parameters metadata

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-Math-1.5B-Instruct
3
+ library_name: transformers
4
+ model_name: Qwen2.5-Math-1.5B-Instruct-PRM800K-SHARP-PRM
5
+ tags:
6
+ - generated_from_trainer
7
+ - prm
8
+ - trl
9
+ - math
10
+ - process-reward-model
11
+ - qwen2.5
12
+ - sharp
13
+ ---
14
+
15
+ # Model Card for Qwen2.5-Math-1.5B-Instruct-PRM800K-SHARP-PRM
16
+
17
+ ## Introduction
18
+
19
+ **Qwen2.5-Math-1.5B-Instruct-PRM800K-SHARP-PRM** is a Process Reward Model (PRM) fine-tuned from [Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct). This model is specifically designed to evaluate the correctness of intermediate reasoning steps in mathematical problem-solving processes, enabling more reliable and interpretable mathematical reasoning.
20
+
21
+ The model has been trained on the **PRM800K** dataset using the Process Reward Model methodology, which provides step-by-step feedback on mathematical reasoning chains.
22
+
23
+ This model is part of the SHARP-PRM series, trained using advanced Process Reward Model techniques.
24
+
25
+ ## Model Information
26
+
27
+ ### Base Model
28
+ - **Base Model**: [Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct)
29
+ - **Architecture**: Qwen2ForTokenClassification
30
+ - **Parameters**: 1.5B
31
+
32
+ ### Training Details
33
+ - **Training Dataset**: PRM800K (Process Reward Model dataset with 800K examples)
34
+ - **Training Method**: Process Reward Model (PRM) as introduced in [Uesato et al., 2022](https://huggingface.co/papers/2211.14275)
35
+ - **Training Framework**: [TRL (Transformer Reinforcement Learning)](https://github.com/huggingface/trl) v0.24.0
36
+ - **Task Type**: Token Classification (binary classification: error/correct for each reasoning step)
37
+
38
+ ## PRM Evaluation
39
+
40
+ This model is designed to evaluate mathematical reasoning processes by:
41
+ 1. **Step-level Evaluation**: Classifying each step in a reasoning chain as either "correct" or "error"
42
+ 2. **Process Feedback**: Providing feedback on the reasoning process, not just the final answer
43
+ 3. **Error Detection**: Identifying where mistakes occur in multi-step mathematical solutions
44
+
45
+ ### Evaluation Metrics
46
+ The model is evaluated on the [ProcessBench](https://huggingface.co/datasets/Qwen/ProcessBench) benchmark.
47
+
48
+ Key metrics include:
49
+ - **Error Accuracy**: Ability to correctly identify incorrect steps
50
+ - **Correct Accuracy**: Ability to correctly identify correct steps
51
+ - **F1 Score**: Balanced measure of error and correct step classification
52
+
53
+ ## Quick Start
54
+
55
+ ### Installation
56
+
57
+ ```bash
58
+ pip install transformers torch
59
+ ```
60
+
61
+ ### Basic Usage
62
+
63
+ #### Using the Model for Step Classification
64
+
65
+ ```python
66
+ from transformers import AutoModelForTokenClassification, AutoTokenizer
67
+ import torch
68
+ import torch.nn.functional as F
69
+
70
+ model_name = "path/to/Qwen2.5-Math-1.5B-Instruct-PRM800K-SHARP-PRM"
71
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
72
+ model = AutoModelForTokenClassification.from_pretrained(model_name)
73
+ model.eval()
74
+
75
+ # Example: Evaluate a mathematical reasoning chain
76
+ # Problem with steps (one correct, one incorrect)
77
+ problem = "Solve: 2x + 5 = 13"
78
+ steps = [
79
+ "Subtract 5 from both sides: 2x = 8", # Correct step
80
+ "Divide by 2: x = 5" # Incorrect step (should be x = 4)
81
+ ]
82
+
83
+ # Format input with step separator
84
+ input_text = problem + "\n\n" + "\n\n".join(steps)
85
+ inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=8192)
86
+
87
+ # Get model predictions
88
+ with torch.no_grad():
89
+ outputs = model(**inputs)
90
+ logits = outputs.logits # Shape: [batch_size, sequence_length, num_labels]
91
+ probabilities = F.softmax(logits, dim=-1) # Convert to probabilities
92
+ predictions = torch.argmax(logits, dim=-1) # Get predicted class indices
93
+
94
+ # Aggregate predictions per step
95
+ # In practice, you would map tokens to steps based on your step separator
96
+ labels = ["error", "correct"]
97
+ for i, step in enumerate(steps):
98
+ # Get average probability for step tokens (simplified)
99
+ # In real usage, you'd need to map token positions to step boundaries
100
+ step_start = len(tokenizer(problem + "\n\n", return_tensors="pt")["input_ids"][0])
101
+ step_tokens = predictions[0, step_start:step_start+len(tokenizer(step)["input_ids"])]
102
+ step_label = labels[step_tokens.mode().values.item()] if len(step_tokens) > 0 else "unknown"
103
+ print(f"\nStep {i+1}: {step}")
104
+ print(f" Prediction: {step_label}")
105
+ print(f" Confidence: {probabilities[0, step_start, 1].item():.2%}")
106
+
107
+ # Expected output:
108
+ # Step 1: Subtract 5 from both sides: 2x = 8
109
+ # Prediction: correct
110
+ # Confidence: 0.95
111
+ #
112
+ # Step 2: Divide by 2: x = 5
113
+ # Prediction: error
114
+ # Confidence: 0.87
115
+ ```
116
+
117
+ **Output Interpretation:**
118
+
119
+ - **Logits**: Raw scores from the model (before softmax). Higher values indicate stronger confidence.
120
+ - **Probabilities**: Softmax-normalized scores between 0 and 1. Sum to 1 for each token.
121
+ - **Predictions**: Class indices (0 = "error", 1 = "correct") for each token.
122
+
123
+ #### Using with Pipeline
124
+
125
+ ```python
126
+ from transformers import pipeline
127
+
128
+ classifier = pipeline(
129
+ "token-classification",
130
+ model="path/to/Qwen2.5-Math-1.5B-Instruct-PRM800K-SHARP-PRM",
131
+ tokenizer="path/to/Qwen2.5-Math-1.5B-Instruct-PRM800K-SHARP-PRM",
132
+ device=0 if torch.cuda.is_available() else -1
133
+ )
134
+
135
+ # Classify reasoning steps
136
+ result = classifier(problem + "\n\n" + "\n\n".join(steps))
137
+ ```
138
+
139
+ ### Integration with Mathematical Reasoning
140
+
141
+ This PRM model can be used to:
142
+ 1. **Filter incorrect reasoning paths** in tree-of-thought or chain-of-thought generation
143
+ 2. **Provide feedback** during step-by-step problem solving
144
+ 3. **Evaluate solution quality** before final answer generation
145
+ 4. **Improve training** by identifying problematic reasoning patterns
146
+
147
+ ## Training Procedure
148
+
149
+ ### Training Configuration
150
+
151
+ - **Learning Rate**: 2e-5
152
+ - **Batch Size**: Per-device batch size (with gradient accumulation)
153
+ - **Epochs**: Multiple epochs with early stopping
154
+ - **Optimizer**: AdamW with cosine learning rate schedule
155
+ - **Warmup Ratio**: 3%
156
+ - **Gradient Clipping**: 5.0
157
+ - **Precision**: bfloat16
158
+ - **Gradient Checkpointing**: Enabled for memory efficiency
159
+
160
+ ### Training Framework Versions
161
+
162
+ - **TRL**: 0.24.0
163
+ - **Transformers**: 4.56.2
164
+ - **PyTorch**: 2.9.1
165
+ - **Datasets**: 4.4.1
166
+ - **Tokenizers**: 0.22.1
167
+
168
+ ### Training Data
169
+
170
+ The model was trained on the **PRM800K** dataset, which contains:
171
+ - Mathematical problems with step-by-step solutions
172
+ - Labeled reasoning steps (correct/error)
173
+ - Diverse mathematical domains and difficulty levels
174
+
175
+ ## Use Cases
176
+
177
+ ### 1. Mathematical Reasoning Evaluation
178
+ - Evaluate intermediate steps in mathematical problem-solving
179
+ - Identify errors in multi-step calculations
180
+ - Provide feedback on reasoning quality
181
+
182
+ ### 2. Educational Applications
183
+ - Automated grading of mathematical solutions
184
+ - Step-by-step feedback for students
185
+ - Identification of common error patterns
186
+
187
+ ### 3. Research Applications
188
+ - Training better mathematical reasoning models
189
+ - Analyzing reasoning patterns
190
+ - Improving chain-of-thought generation
191
+
192
+ ## Limitations and Considerations
193
+
194
+ 1. **Domain Specificity**: This model is specifically trained for mathematical reasoning and may not generalize well to other domains
195
+ 2. **Step Length**: The model is optimized for step-level evaluation with a 256-token context per step
196
+ 3. **Language**: The model is primarily trained on English mathematical content
197
+ 4. **False Positives/Negatives**: Like all classification models, it may misclassify some steps
198
+
199
+ ## Citation
200
+
201
+ If you use this model in your research, please cite:
202
+
203
+ ```bibtex
204
+ @misc{qwen2.5-math-1.5b-instruct-prm800k-sharp-prm,
205
+ title={Qwen2.5-Math-1.5B-Instruct-PRM800K-SHARP-PRM: A Process Reward Model for Mathematical Reasoning},
206
+ author={Your Name/Organization},
207
+ year={2025},
208
+ howpublished={\url{https://huggingface.co/path/to/Qwen2.5-Math-1.5B-Instruct-PRM800K-SHARP-PRM}}
209
+ }
210
+ ```
211
+
212
+ **Model Card Version**: 1.0
213
+ **Last Updated**: 2025-12-30
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
chat_template.jinja ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if tools %}
2
+ {{- '<|im_start|>system\n' }}
3
+ {%- if messages[0]['role'] == 'system' %}
4
+ {{- messages[0]['content'] }}
5
+ {%- else %}
6
+ {{- 'Please reason step by step, and put your final answer within \\boxed{}.' }}
7
+ {%- endif %}
8
+ {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
9
+ {%- for tool in tools %}
10
+ {{- "\n" }}
11
+ {{- tool | tojson }}
12
+ {%- endfor %}
13
+ {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
14
+ {%- else %}
15
+ {%- if messages[0]['role'] == 'system' %}
16
+ {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
17
+ {%- else %}
18
+ {{- '<|im_start|>system\nPlease reason step by step, and put your final answer within \\boxed{}.<|im_end|>\n' }}
19
+ {%- endif %}
20
+ {%- endif %}
21
+ {%- for message in messages %}
22
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
23
+ {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
24
+ {%- elif message.role == "assistant" %}
25
+ {{- '<|im_start|>' + message.role }}
26
+ {%- if message.content %}
27
+ {{- '\n' + message.content }}
28
+ {%- endif %}
29
+ {%- for tool_call in message.tool_calls %}
30
+ {%- if tool_call.function is defined %}
31
+ {%- set tool_call = tool_call.function %}
32
+ {%- endif %}
33
+ {{- '\n<tool_call>\n{"name": "' }}
34
+ {{- tool_call.name }}
35
+ {{- '", "arguments": ' }}
36
+ {{- tool_call.arguments | tojson }}
37
+ {{- '}\n</tool_call>' }}
38
+ {%- endfor %}
39
+ {{- '<|im_end|>\n' }}
40
+ {%- elif message.role == "tool" %}
41
+ {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
42
+ {{- '<|im_start|>user' }}
43
+ {%- endif %}
44
+ {{- '\n<tool_response>\n' }}
45
+ {{- message.content }}
46
+ {{- '\n</tool_response>' }}
47
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
48
+ {{- '<|im_end|>\n' }}
49
+ {%- endif %}
50
+ {%- endif %}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|im_start|>assistant\n' }}
54
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForTokenClassification"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "dtype": "bfloat16",
7
+ "eos_token_id": 151645,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 1536,
10
+ "id2label": {
11
+ "0": "error",
12
+ "1": "correct"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 8960,
16
+ "label2id": {
17
+ "correct": 1,
18
+ "error": 0
19
+ },
20
+ "layer_types": [
21
+ "full_attention",
22
+ "full_attention",
23
+ "full_attention",
24
+ "full_attention",
25
+ "full_attention",
26
+ "full_attention",
27
+ "full_attention",
28
+ "full_attention",
29
+ "full_attention",
30
+ "full_attention",
31
+ "full_attention",
32
+ "full_attention",
33
+ "full_attention",
34
+ "full_attention",
35
+ "full_attention",
36
+ "full_attention",
37
+ "full_attention",
38
+ "full_attention",
39
+ "full_attention",
40
+ "full_attention",
41
+ "full_attention",
42
+ "full_attention",
43
+ "full_attention",
44
+ "full_attention",
45
+ "full_attention",
46
+ "full_attention",
47
+ "full_attention",
48
+ "full_attention"
49
+ ],
50
+ "max_position_embeddings": 4096,
51
+ "max_window_layers": 21,
52
+ "model_type": "qwen2",
53
+ "num_attention_heads": 12,
54
+ "num_hidden_layers": 28,
55
+ "num_key_value_heads": 2,
56
+ "pad_token_id": 151643,
57
+ "rms_norm_eps": 1e-06,
58
+ "rope_scaling": null,
59
+ "rope_theta": 10000.0,
60
+ "sliding_window": null,
61
+ "tie_word_embeddings": true,
62
+ "transformers_version": "4.56.2",
63
+ "use_cache": true,
64
+ "use_sliding_window": false,
65
+ "vocab_size": 151936
66
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14d866478a8c4c46bf2dc9c311219e227eba8ee008b04f2625ce8e89c4e714ac
3
+ size 2498350192
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc9691a8dc41bccd3108efb643c6f3f4e39852ce4a9b37def600abfa4995a839
3
+ size 589123068
model.safetensors.index.json ADDED
@@ -0,0 +1,348 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_parameters": 1543736630,
4
+ "total_size": 3087473260
5
+ },
6
+ "weight_map": {
7
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
8
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.0.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
17
+ "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.0.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
19
+ "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.1.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
26
+ "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.1.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
29
+ "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
31
+ "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.10.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
38
+ "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.10.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
41
+ "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.10.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
43
+ "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.11.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
50
+ "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.11.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
53
+ "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.11.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
55
+ "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.12.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
62
+ "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.12.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
65
+ "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.12.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
67
+ "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.13.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
74
+ "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
76
+ "model.layers.13.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
77
+ "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.13.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
79
+ "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
81
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
83
+ "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
85
+ "model.layers.14.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
86
+ "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
87
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.14.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
89
+ "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
90
+ "model.layers.14.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
91
+ "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
92
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
94
+ "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
97
+ "model.layers.15.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
98
+ "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
99
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
100
+ "model.layers.15.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
101
+ "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.15.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
103
+ "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
106
+ "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
107
+ "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
108
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
109
+ "model.layers.16.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
110
+ "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
111
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
112
+ "model.layers.16.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
113
+ "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
114
+ "model.layers.16.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
115
+ "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
116
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
117
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
118
+ "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
119
+ "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
120
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
121
+ "model.layers.17.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
122
+ "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
123
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
124
+ "model.layers.17.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
125
+ "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
126
+ "model.layers.17.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
127
+ "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
128
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
129
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
130
+ "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
131
+ "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
132
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
133
+ "model.layers.18.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
134
+ "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
135
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
136
+ "model.layers.18.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
137
+ "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
138
+ "model.layers.18.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
139
+ "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
140
+ "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
141
+ "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
142
+ "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
143
+ "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
144
+ "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
145
+ "model.layers.19.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
146
+ "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
147
+ "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.19.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
149
+ "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.19.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
151
+ "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
153
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
154
+ "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
155
+ "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
156
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
157
+ "model.layers.2.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
158
+ "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
159
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
160
+ "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
161
+ "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
162
+ "model.layers.2.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
163
+ "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
164
+ "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
165
+ "model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
166
+ "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
167
+ "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.20.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
170
+ "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
171
+ "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
172
+ "model.layers.20.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
173
+ "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
174
+ "model.layers.20.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
175
+ "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
176
+ "model.layers.21.input_layernorm.weight": "model-00002-of-00002.safetensors",
177
+ "model.layers.21.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
178
+ "model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
179
+ "model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
180
+ "model.layers.21.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
181
+ "model.layers.21.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
182
+ "model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
183
+ "model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
184
+ "model.layers.21.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
185
+ "model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
186
+ "model.layers.21.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
187
+ "model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
188
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
189
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
190
+ "model.layers.22.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
191
+ "model.layers.22.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
192
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
193
+ "model.layers.22.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
194
+ "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
195
+ "model.layers.22.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
196
+ "model.layers.22.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
197
+ "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
198
+ "model.layers.22.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
199
+ "model.layers.22.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
200
+ "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
201
+ "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
202
+ "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
203
+ "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
204
+ "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
205
+ "model.layers.23.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
206
+ "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
207
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
208
+ "model.layers.23.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
209
+ "model.layers.23.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
210
+ "model.layers.23.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
211
+ "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
212
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
213
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
214
+ "model.layers.24.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
215
+ "model.layers.24.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
216
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
217
+ "model.layers.24.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
218
+ "model.layers.24.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
219
+ "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
220
+ "model.layers.24.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
221
+ "model.layers.24.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
222
+ "model.layers.24.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
223
+ "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
224
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
225
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
226
+ "model.layers.25.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
227
+ "model.layers.25.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
228
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
229
+ "model.layers.25.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
230
+ "model.layers.25.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
231
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
232
+ "model.layers.25.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
233
+ "model.layers.25.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
234
+ "model.layers.25.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
235
+ "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
236
+ "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
237
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
238
+ "model.layers.26.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
239
+ "model.layers.26.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
240
+ "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
241
+ "model.layers.26.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
242
+ "model.layers.26.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
243
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
244
+ "model.layers.26.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
245
+ "model.layers.26.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
246
+ "model.layers.26.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
247
+ "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
248
+ "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
249
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
250
+ "model.layers.27.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
251
+ "model.layers.27.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
252
+ "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
253
+ "model.layers.27.self_attn.k_proj.bias": "model-00002-of-00002.safetensors",
254
+ "model.layers.27.self_attn.k_proj.weight": "model-00002-of-00002.safetensors",
255
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
256
+ "model.layers.27.self_attn.q_proj.bias": "model-00002-of-00002.safetensors",
257
+ "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00002.safetensors",
258
+ "model.layers.27.self_attn.v_proj.bias": "model-00002-of-00002.safetensors",
259
+ "model.layers.27.self_attn.v_proj.weight": "model-00002-of-00002.safetensors",
260
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
261
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
262
+ "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
263
+ "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
264
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
265
+ "model.layers.3.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
266
+ "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
267
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
268
+ "model.layers.3.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
269
+ "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
270
+ "model.layers.3.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
271
+ "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
272
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
273
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
274
+ "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
275
+ "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
276
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
277
+ "model.layers.4.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
278
+ "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
279
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
280
+ "model.layers.4.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
281
+ "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
282
+ "model.layers.4.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
283
+ "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
284
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
285
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
286
+ "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
287
+ "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
288
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
289
+ "model.layers.5.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
290
+ "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
291
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
292
+ "model.layers.5.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
293
+ "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
294
+ "model.layers.5.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
295
+ "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
296
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
297
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
298
+ "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
299
+ "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
300
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
301
+ "model.layers.6.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
302
+ "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
303
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
304
+ "model.layers.6.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
305
+ "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
306
+ "model.layers.6.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
307
+ "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
308
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
309
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
310
+ "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
311
+ "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
312
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
313
+ "model.layers.7.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
314
+ "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
315
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
316
+ "model.layers.7.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
317
+ "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
318
+ "model.layers.7.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
319
+ "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
320
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
321
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
322
+ "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
323
+ "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
324
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
325
+ "model.layers.8.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
326
+ "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
327
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
328
+ "model.layers.8.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
329
+ "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
330
+ "model.layers.8.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
331
+ "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
332
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
333
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
334
+ "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
335
+ "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
336
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
337
+ "model.layers.9.self_attn.k_proj.bias": "model-00001-of-00002.safetensors",
338
+ "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
339
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
340
+ "model.layers.9.self_attn.q_proj.bias": "model-00001-of-00002.safetensors",
341
+ "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
342
+ "model.layers.9.self_attn.v_proj.bias": "model-00001-of-00002.safetensors",
343
+ "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
344
+ "model.norm.weight": "model-00002-of-00002.safetensors",
345
+ "score.bias": "model-00002-of-00002.safetensors",
346
+ "score.weight": "model-00002-of-00002.safetensors"
347
+ }
348
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "clean_up_tokenization_spaces": false,
199
+ "eos_token": "<|im_end|>",
200
+ "errors": "replace",
201
+ "extra_special_tokens": {},
202
+ "model_max_length": 131072,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
trainer_state.json ADDED
@@ -0,0 +1,1965 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": 0.9005018183708923,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 0.9947643979057592,
6
+ "eval_steps": 16,
7
+ "global_step": 760,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.005235602094240838,
14
+ "grad_norm": 433.5572204589844,
15
+ "learning_rate": 5.217391304347826e-07,
16
+ "loss": 5.5489,
17
+ "step": 4
18
+ },
19
+ {
20
+ "epoch": 0.005235602094240838,
21
+ "eval_F1_err_corr": 0.2899344909681993,
22
+ "eval_accuracy": 0.33964423820572315,
23
+ "eval_correct_accuracy": 0.21863927522501062,
24
+ "eval_error_accuracy": 0.4302247894550517,
25
+ "eval_f1": 0.2573237770510055,
26
+ "eval_loss": 1.461071491241455,
27
+ "eval_pr_auc": 0.16429768646454848,
28
+ "eval_precision": 0.15202466598150052,
29
+ "eval_recall": 0.8372198324654743,
30
+ "eval_runtime": 24.9385,
31
+ "eval_samples_per_second": 196.202,
32
+ "eval_steps_per_second": 0.802,
33
+ "step": 4
34
+ },
35
+ {
36
+ "epoch": 0.010471204188481676,
37
+ "grad_norm": 424.1092834472656,
38
+ "learning_rate": 1.2173913043478262e-06,
39
+ "loss": 5.392,
40
+ "step": 8
41
+ },
42
+ {
43
+ "epoch": 0.010471204188481676,
44
+ "eval_F1_err_corr": 0.321098900363111,
45
+ "eval_accuracy": 0.3666511987625677,
46
+ "eval_correct_accuracy": 0.2498037913412515,
47
+ "eval_error_accuracy": 0.4493435370426137,
48
+ "eval_f1": 0.25948565848012445,
49
+ "eval_loss": 1.3678739070892334,
50
+ "eval_pr_auc": 0.16391947366452397,
51
+ "eval_precision": 0.15441239776151527,
52
+ "eval_recall": 0.8120896536110482,
53
+ "eval_runtime": 24.8266,
54
+ "eval_samples_per_second": 197.087,
55
+ "eval_steps_per_second": 0.806,
56
+ "step": 8
57
+ },
58
+ {
59
+ "epoch": 0.015706806282722512,
60
+ "grad_norm": 328.165771484375,
61
+ "learning_rate": 1.9130434782608697e-06,
62
+ "loss": 4.6674,
63
+ "step": 12
64
+ },
65
+ {
66
+ "epoch": 0.015706806282722512,
67
+ "eval_F1_err_corr": 0.42193034247158306,
68
+ "eval_accuracy": 0.4537664346481052,
69
+ "eval_correct_accuracy": 0.36420961710522887,
70
+ "eval_error_accuracy": 0.5013920328557274,
71
+ "eval_f1": 0.25876327610091937,
72
+ "eval_loss": 1.08595609664917,
73
+ "eval_pr_auc": 0.16343108665970787,
74
+ "eval_precision": 0.15883323026180168,
75
+ "eval_recall": 0.6977586597237945,
76
+ "eval_runtime": 24.7964,
77
+ "eval_samples_per_second": 197.327,
78
+ "eval_steps_per_second": 0.807,
79
+ "step": 12
80
+ },
81
+ {
82
+ "epoch": 0.020942408376963352,
83
+ "grad_norm": 121.70630645751953,
84
+ "learning_rate": 2.6086956521739132e-06,
85
+ "loss": 3.0944,
86
+ "step": 16
87
+ },
88
+ {
89
+ "epoch": 0.020942408376963352,
90
+ "eval_F1_err_corr": 0.721921189250657,
91
+ "eval_accuracy": 0.7358391337973704,
92
+ "eval_correct_accuracy": 0.7870115524045516,
93
+ "eval_error_accuracy": 0.6667750717278245,
94
+ "eval_f1": 0.1906928253246138,
95
+ "eval_loss": 0.5721015334129333,
96
+ "eval_pr_auc": 0.16146374192556026,
97
+ "eval_precision": 0.16400391261819366,
98
+ "eval_recall": 0.22775639574371745,
99
+ "eval_runtime": 24.8528,
100
+ "eval_samples_per_second": 196.879,
101
+ "eval_steps_per_second": 0.805,
102
+ "step": 16
103
+ },
104
+ {
105
+ "epoch": 0.02617801047120419,
106
+ "grad_norm": 18.458759307861328,
107
+ "learning_rate": 3.3043478260869567e-06,
108
+ "loss": 2.002,
109
+ "step": 20
110
+ },
111
+ {
112
+ "epoch": 0.02617801047120419,
113
+ "eval_F1_err_corr": 0.8402340555989792,
114
+ "eval_accuracy": 0.8622119102861562,
115
+ "eval_correct_accuracy": 0.9922276019965096,
116
+ "eval_error_accuracy": 0.728620881787505,
117
+ "eval_f1": 0.0035794183445190158,
118
+ "eval_loss": 0.7684443593025208,
119
+ "eval_pr_auc": 0.17251592682571912,
120
+ "eval_precision": 0.1509433962264151,
121
+ "eval_recall": 0.001811184061580258,
122
+ "eval_runtime": 24.8601,
123
+ "eval_samples_per_second": 196.822,
124
+ "eval_steps_per_second": 0.805,
125
+ "step": 20
126
+ },
127
+ {
128
+ "epoch": 0.031413612565445025,
129
+ "grad_norm": 115.0860595703125,
130
+ "learning_rate": 4.000000000000001e-06,
131
+ "loss": 3.383,
132
+ "step": 24
133
+ },
134
+ {
135
+ "epoch": 0.031413612565445025,
136
+ "eval_F1_err_corr": 0.8378682847250856,
137
+ "eval_accuracy": 0.8620881670533642,
138
+ "eval_correct_accuracy": 0.9842169035446346,
139
+ "eval_error_accuracy": 0.7294085238385144,
140
+ "eval_f1": 0.008451957295373666,
141
+ "eval_loss": 0.8194268941879272,
142
+ "eval_pr_auc": 0.18009572056722223,
143
+ "eval_precision": 0.24050632911392406,
144
+ "eval_recall": 0.004301562146253113,
145
+ "eval_runtime": 24.8294,
146
+ "eval_samples_per_second": 197.065,
147
+ "eval_steps_per_second": 0.805,
148
+ "step": 24
149
+ },
150
+ {
151
+ "epoch": 0.03664921465968586,
152
+ "grad_norm": 99.96025848388672,
153
+ "learning_rate": 4.695652173913044e-06,
154
+ "loss": 3.1857,
155
+ "step": 28
156
+ },
157
+ {
158
+ "epoch": 0.03664921465968586,
159
+ "eval_F1_err_corr": 0.8207683484774771,
160
+ "eval_accuracy": 0.8591492652745553,
161
+ "eval_correct_accuracy": 0.9328157189465495,
162
+ "eval_error_accuracy": 0.7327520943454843,
163
+ "eval_f1": 0.060268317853457175,
164
+ "eval_loss": 0.6408756971359253,
165
+ "eval_pr_auc": 0.19809419361676622,
166
+ "eval_precision": 0.3411214953271028,
167
+ "eval_recall": 0.03305410912383971,
168
+ "eval_runtime": 24.8048,
169
+ "eval_samples_per_second": 197.261,
170
+ "eval_steps_per_second": 0.806,
171
+ "step": 28
172
+ },
173
+ {
174
+ "epoch": 0.041884816753926704,
175
+ "grad_norm": 48.16204071044922,
176
+ "learning_rate": 5.391304347826088e-06,
177
+ "loss": 2.1498,
178
+ "step": 32
179
+ },
180
+ {
181
+ "epoch": 0.041884816753926704,
182
+ "eval_F1_err_corr": 0.7694538304703895,
183
+ "eval_accuracy": 0.837370456303171,
184
+ "eval_correct_accuracy": 0.8088768689020601,
185
+ "eval_error_accuracy": 0.7336950054749075,
186
+ "eval_f1": 0.21148942552872357,
187
+ "eval_loss": 0.44809016585350037,
188
+ "eval_pr_auc": 0.2318064013915099,
189
+ "eval_precision": 0.31333333333333335,
190
+ "eval_recall": 0.15961059542676023,
191
+ "eval_runtime": 24.8313,
192
+ "eval_samples_per_second": 197.05,
193
+ "eval_steps_per_second": 0.805,
194
+ "step": 32
195
+ },
196
+ {
197
+ "epoch": 0.04712041884816754,
198
+ "grad_norm": 169.05667114257812,
199
+ "learning_rate": 6.086956521739132e-06,
200
+ "loss": 1.9176,
201
+ "step": 36
202
+ },
203
+ {
204
+ "epoch": 0.04712041884816754,
205
+ "eval_F1_err_corr": 0.6219027098098009,
206
+ "eval_accuracy": 0.7121732405259087,
207
+ "eval_correct_accuracy": 0.5674362971053744,
208
+ "eval_error_accuracy": 0.687935454975843,
209
+ "eval_f1": 0.33722752528850264,
210
+ "eval_loss": 0.5731640458106995,
211
+ "eval_pr_auc": 0.2788670530837179,
212
+ "eval_precision": 0.2460243217960711,
213
+ "eval_recall": 0.5358840842200588,
214
+ "eval_runtime": 24.8463,
215
+ "eval_samples_per_second": 196.931,
216
+ "eval_steps_per_second": 0.805,
217
+ "step": 36
218
+ },
219
+ {
220
+ "epoch": 0.05235602094240838,
221
+ "grad_norm": 114.60936737060547,
222
+ "learning_rate": 6.782608695652174e-06,
223
+ "loss": 2.0171,
224
+ "step": 40
225
+ },
226
+ {
227
+ "epoch": 0.05235602094240838,
228
+ "eval_F1_err_corr": 0.7430224807525013,
229
+ "eval_accuracy": 0.7990719257540603,
230
+ "eval_correct_accuracy": 0.7547544657206874,
231
+ "eval_error_accuracy": 0.7316496396965659,
232
+ "eval_f1": 0.35211970074812965,
233
+ "eval_loss": 0.4375106692314148,
234
+ "eval_pr_auc": 0.3012571706540082,
235
+ "eval_precision": 0.3147289586305278,
236
+ "eval_recall": 0.39959248358614446,
237
+ "eval_runtime": 24.8039,
238
+ "eval_samples_per_second": 197.268,
239
+ "eval_steps_per_second": 0.806,
240
+ "step": 40
241
+ },
242
+ {
243
+ "epoch": 0.05759162303664921,
244
+ "grad_norm": 32.457340240478516,
245
+ "learning_rate": 7.478260869565218e-06,
246
+ "loss": 1.611,
247
+ "step": 44
248
+ },
249
+ {
250
+ "epoch": 0.05759162303664921,
251
+ "eval_F1_err_corr": 0.8419241583637731,
252
+ "eval_accuracy": 0.862830626450116,
253
+ "eval_correct_accuracy": 0.9871360561186291,
254
+ "eval_error_accuracy": 0.7339561045659535,
255
+ "eval_f1": 0.09914668833807395,
256
+ "eval_loss": 0.38381654024124146,
257
+ "eval_pr_auc": 0.3152015073109494,
258
+ "eval_precision": 0.48316831683168315,
259
+ "eval_recall": 0.05524111387819787,
260
+ "eval_runtime": 24.832,
261
+ "eval_samples_per_second": 197.044,
262
+ "eval_steps_per_second": 0.805,
263
+ "step": 44
264
+ },
265
+ {
266
+ "epoch": 0.06282722513089005,
267
+ "grad_norm": 22.770957946777344,
268
+ "learning_rate": 8.173913043478263e-06,
269
+ "loss": 1.5516,
270
+ "step": 48
271
+ },
272
+ {
273
+ "epoch": 0.06282722513089005,
274
+ "eval_F1_err_corr": 0.8403483674599067,
275
+ "eval_accuracy": 0.859891724671307,
276
+ "eval_correct_accuracy": 0.9726090387657613,
277
+ "eval_error_accuracy": 0.7397527603961362,
278
+ "eval_f1": 0.19484444444444443,
279
+ "eval_loss": 0.355484277009964,
280
+ "eval_pr_auc": 0.3399032342610957,
281
+ "eval_precision": 0.45364238410596025,
282
+ "eval_recall": 0.12406610821824768,
283
+ "eval_runtime": 24.7998,
284
+ "eval_samples_per_second": 197.3,
285
+ "eval_steps_per_second": 0.806,
286
+ "step": 48
287
+ },
288
+ {
289
+ "epoch": 0.06806282722513089,
290
+ "grad_norm": 69.65153503417969,
291
+ "learning_rate": 8.869565217391306e-06,
292
+ "loss": 1.4648,
293
+ "step": 52
294
+ },
295
+ {
296
+ "epoch": 0.06806282722513089,
297
+ "eval_F1_err_corr": 0.8036896045473728,
298
+ "eval_accuracy": 0.8478267594740913,
299
+ "eval_correct_accuracy": 0.8494371774369468,
300
+ "eval_error_accuracy": 0.762617804842102,
301
+ "eval_f1": 0.38658186806334954,
302
+ "eval_loss": 0.36472800374031067,
303
+ "eval_pr_auc": 0.38944473786040484,
304
+ "eval_precision": 0.4303164908384231,
305
+ "eval_recall": 0.350916911931175,
306
+ "eval_runtime": 24.8345,
307
+ "eval_samples_per_second": 197.025,
308
+ "eval_steps_per_second": 0.805,
309
+ "step": 52
310
+ },
311
+ {
312
+ "epoch": 0.07329842931937172,
313
+ "grad_norm": 9.83507251739502,
314
+ "learning_rate": 9.565217391304349e-06,
315
+ "loss": 1.3967,
316
+ "step": 56
317
+ },
318
+ {
319
+ "epoch": 0.07329842931937172,
320
+ "eval_F1_err_corr": 0.8296970467203022,
321
+ "eval_accuracy": 0.8688321732405259,
322
+ "eval_correct_accuracy": 0.9181484628791651,
323
+ "eval_error_accuracy": 0.7567903433781845,
324
+ "eval_f1": 0.29262595929262597,
325
+ "eval_loss": 0.33204466104507446,
326
+ "eval_pr_auc": 0.41163972835541246,
327
+ "eval_precision": 0.5561192136968929,
328
+ "eval_recall": 0.19855105275073578,
329
+ "eval_runtime": 24.8293,
330
+ "eval_samples_per_second": 197.066,
331
+ "eval_steps_per_second": 0.805,
332
+ "step": 56
333
+ },
334
+ {
335
+ "epoch": 0.07853403141361257,
336
+ "grad_norm": 16.36752700805664,
337
+ "learning_rate": 1.0260869565217393e-05,
338
+ "loss": 1.2944,
339
+ "step": 60
340
+ },
341
+ {
342
+ "epoch": 0.07853403141361257,
343
+ "eval_F1_err_corr": 0.8421678884358804,
344
+ "eval_accuracy": 0.8736581593194123,
345
+ "eval_correct_accuracy": 0.9276144097142335,
346
+ "eval_error_accuracy": 0.7711353234659966,
347
+ "eval_f1": 0.38438347904733194,
348
+ "eval_loss": 0.31788310408592224,
349
+ "eval_pr_auc": 0.4608372769142751,
350
+ "eval_precision": 0.5751014884979703,
351
+ "eval_recall": 0.28865745981435365,
352
+ "eval_runtime": 24.8595,
353
+ "eval_samples_per_second": 196.826,
354
+ "eval_steps_per_second": 0.805,
355
+ "step": 60
356
+ },
357
+ {
358
+ "epoch": 0.08376963350785341,
359
+ "grad_norm": 17.503982543945312,
360
+ "learning_rate": 1.0956521739130435e-05,
361
+ "loss": 1.284,
362
+ "step": 64
363
+ },
364
+ {
365
+ "epoch": 0.08376963350785341,
366
+ "eval_F1_err_corr": 0.8579401681935629,
367
+ "eval_accuracy": 0.8769682907965971,
368
+ "eval_correct_accuracy": 0.9478163203454459,
369
+ "eval_error_accuracy": 0.7836326415058088,
370
+ "eval_f1": 0.4385147536354652,
371
+ "eval_loss": 0.30793023109436035,
372
+ "eval_pr_auc": 0.49259983395831536,
373
+ "eval_precision": 0.5825206301575394,
374
+ "eval_recall": 0.3515961059542676,
375
+ "eval_runtime": 24.8423,
376
+ "eval_samples_per_second": 196.962,
377
+ "eval_steps_per_second": 0.805,
378
+ "step": 64
379
+ },
380
+ {
381
+ "epoch": 0.08900523560209424,
382
+ "grad_norm": 18.571523666381836,
383
+ "learning_rate": 1.1652173913043478e-05,
384
+ "loss": 1.191,
385
+ "step": 68
386
+ },
387
+ {
388
+ "epoch": 0.08900523560209424,
389
+ "eval_F1_err_corr": 0.8618882109239977,
390
+ "eval_accuracy": 0.8807424593967518,
391
+ "eval_correct_accuracy": 0.9652649658606812,
392
+ "eval_error_accuracy": 0.7785120852969489,
393
+ "eval_f1": 0.38741458763705705,
394
+ "eval_loss": 0.30222997069358826,
395
+ "eval_pr_auc": 0.5105597340335465,
396
+ "eval_precision": 0.6497867803837953,
397
+ "eval_recall": 0.2759791713832918,
398
+ "eval_runtime": 24.8386,
399
+ "eval_samples_per_second": 196.992,
400
+ "eval_steps_per_second": 0.805,
401
+ "step": 68
402
+ },
403
+ {
404
+ "epoch": 0.09424083769633508,
405
+ "grad_norm": 19.51681900024414,
406
+ "learning_rate": 1.2347826086956523e-05,
407
+ "loss": 1.1905,
408
+ "step": 72
409
+ },
410
+ {
411
+ "epoch": 0.09424083769633508,
412
+ "eval_F1_err_corr": 0.8556927040319808,
413
+ "eval_accuracy": 0.8829079659706109,
414
+ "eval_correct_accuracy": 0.9252440207528985,
415
+ "eval_error_accuracy": 0.7958667759923117,
416
+ "eval_f1": 0.46682631356529086,
417
+ "eval_loss": 0.2958272099494934,
418
+ "eval_pr_auc": 0.5272941790977752,
419
+ "eval_precision": 0.6178225205070843,
420
+ "eval_recall": 0.37514149875481095,
421
+ "eval_runtime": 24.8293,
422
+ "eval_samples_per_second": 197.065,
423
+ "eval_steps_per_second": 0.805,
424
+ "step": 72
425
+ },
426
+ {
427
+ "epoch": 0.09947643979057591,
428
+ "grad_norm": 21.868053436279297,
429
+ "learning_rate": 1.3043478260869566e-05,
430
+ "loss": 1.1759,
431
+ "step": 76
432
+ },
433
+ {
434
+ "epoch": 0.09947643979057591,
435
+ "eval_F1_err_corr": 0.8671239387996902,
436
+ "eval_accuracy": 0.8833720030935808,
437
+ "eval_correct_accuracy": 0.965744942725071,
438
+ "eval_error_accuracy": 0.7867787965661607,
439
+ "eval_f1": 0.43932183224271265,
440
+ "eval_loss": 0.2900922894477844,
441
+ "eval_pr_auc": 0.5398410773230814,
442
+ "eval_precision": 0.6402254009536195,
443
+ "eval_recall": 0.33438985736925514,
444
+ "eval_runtime": 24.826,
445
+ "eval_samples_per_second": 197.092,
446
+ "eval_steps_per_second": 0.806,
447
+ "step": 76
448
+ },
449
+ {
450
+ "epoch": 0.10471204188481675,
451
+ "grad_norm": 11.858696937561035,
452
+ "learning_rate": 1.373913043478261e-05,
453
+ "loss": 1.1804,
454
+ "step": 80
455
+ },
456
+ {
457
+ "epoch": 0.10471204188481675,
458
+ "eval_F1_err_corr": 0.8643274810534162,
459
+ "eval_accuracy": 0.8850425367362722,
460
+ "eval_correct_accuracy": 0.951882356363745,
461
+ "eval_error_accuracy": 0.7915226184557567,
462
+ "eval_f1": 0.4779432424838438,
463
+ "eval_loss": 0.2871633768081665,
464
+ "eval_pr_auc": 0.5513156434574297,
465
+ "eval_precision": 0.6297667530544243,
466
+ "eval_recall": 0.38510301109350237,
467
+ "eval_runtime": 24.8425,
468
+ "eval_samples_per_second": 196.96,
469
+ "eval_steps_per_second": 0.805,
470
+ "step": 80
471
+ },
472
+ {
473
+ "epoch": 0.1099476439790576,
474
+ "grad_norm": 8.166620254516602,
475
+ "learning_rate": 1.4434782608695654e-05,
476
+ "loss": 1.1212,
477
+ "step": 84
478
+ },
479
+ {
480
+ "epoch": 0.11518324607329843,
481
+ "grad_norm": 4.340336799621582,
482
+ "learning_rate": 1.5130434782608697e-05,
483
+ "loss": 1.1325,
484
+ "step": 88
485
+ },
486
+ {
487
+ "epoch": 0.12041884816753927,
488
+ "grad_norm": 27.051570892333984,
489
+ "learning_rate": 1.582608695652174e-05,
490
+ "loss": 1.1218,
491
+ "step": 92
492
+ },
493
+ {
494
+ "epoch": 0.1256544502617801,
495
+ "grad_norm": 22.343820571899414,
496
+ "learning_rate": 1.6521739130434785e-05,
497
+ "loss": 1.1068,
498
+ "step": 96
499
+ },
500
+ {
501
+ "epoch": 0.13089005235602094,
502
+ "grad_norm": 47.00363540649414,
503
+ "learning_rate": 1.721739130434783e-05,
504
+ "loss": 1.1034,
505
+ "step": 100
506
+ },
507
+ {
508
+ "epoch": 0.13612565445026178,
509
+ "grad_norm": 40.41328048706055,
510
+ "learning_rate": 1.791304347826087e-05,
511
+ "loss": 1.1235,
512
+ "step": 104
513
+ },
514
+ {
515
+ "epoch": 0.14136125654450263,
516
+ "grad_norm": 31.55730628967285,
517
+ "learning_rate": 1.8608695652173912e-05,
518
+ "loss": 1.0747,
519
+ "step": 108
520
+ },
521
+ {
522
+ "epoch": 0.14659685863874344,
523
+ "grad_norm": 2.652536392211914,
524
+ "learning_rate": 1.9304347826086957e-05,
525
+ "loss": 0.9891,
526
+ "step": 112
527
+ },
528
+ {
529
+ "epoch": 0.1518324607329843,
530
+ "grad_norm": 3.2267162799835205,
531
+ "learning_rate": 2e-05,
532
+ "loss": 0.9607,
533
+ "step": 116
534
+ },
535
+ {
536
+ "epoch": 0.15706806282722513,
537
+ "grad_norm": 21.89421272277832,
538
+ "learning_rate": 1.9999942480792804e-05,
539
+ "loss": 1.0643,
540
+ "step": 120
541
+ },
542
+ {
543
+ "epoch": 0.15706806282722513,
544
+ "eval_F1_err_corr": 0.8734264964691199,
545
+ "eval_accuracy": 0.8929311678267595,
546
+ "eval_correct_accuracy": 0.9278371473433551,
547
+ "eval_error_accuracy": 0.8250438945941904,
548
+ "eval_f1": 0.5815499939547818,
549
+ "eval_loss": 0.2678382694721222,
550
+ "eval_pr_auc": 0.6347920534332054,
551
+ "eval_precision": 0.6240269849507005,
552
+ "eval_recall": 0.544487208512565,
553
+ "eval_runtime": 24.8153,
554
+ "eval_samples_per_second": 197.177,
555
+ "eval_steps_per_second": 0.806,
556
+ "step": 120
557
+ },
558
+ {
559
+ "epoch": 0.16230366492146597,
560
+ "grad_norm": 8.081486701965332,
561
+ "learning_rate": 1.999976992383291e-05,
562
+ "loss": 1.0189,
563
+ "step": 124
564
+ },
565
+ {
566
+ "epoch": 0.16753926701570682,
567
+ "grad_norm": 17.748775482177734,
568
+ "learning_rate": 1.9999482331105377e-05,
569
+ "loss": 0.9898,
570
+ "step": 128
571
+ },
572
+ {
573
+ "epoch": 0.17277486910994763,
574
+ "grad_norm": 41.294334411621094,
575
+ "learning_rate": 1.9999079705918636e-05,
576
+ "loss": 1.0795,
577
+ "step": 132
578
+ },
579
+ {
580
+ "epoch": 0.17801047120418848,
581
+ "grad_norm": 4.425788879394531,
582
+ "learning_rate": 1.999856205290442e-05,
583
+ "loss": 1.0274,
584
+ "step": 136
585
+ },
586
+ {
587
+ "epoch": 0.18324607329842932,
588
+ "grad_norm": 26.085590362548828,
589
+ "learning_rate": 1.9997929378017723e-05,
590
+ "loss": 0.9516,
591
+ "step": 140
592
+ },
593
+ {
594
+ "epoch": 0.18848167539267016,
595
+ "grad_norm": 18.811126708984375,
596
+ "learning_rate": 1.9997181688536746e-05,
597
+ "loss": 0.966,
598
+ "step": 144
599
+ },
600
+ {
601
+ "epoch": 0.193717277486911,
602
+ "grad_norm": 22.464527130126953,
603
+ "learning_rate": 1.999631899306278e-05,
604
+ "loss": 0.8932,
605
+ "step": 148
606
+ },
607
+ {
608
+ "epoch": 0.19895287958115182,
609
+ "grad_norm": 8.309951782226562,
610
+ "learning_rate": 1.999534130152014e-05,
611
+ "loss": 0.9756,
612
+ "step": 152
613
+ },
614
+ {
615
+ "epoch": 0.20418848167539266,
616
+ "grad_norm": 4.516532897949219,
617
+ "learning_rate": 1.999424862515604e-05,
618
+ "loss": 0.998,
619
+ "step": 156
620
+ },
621
+ {
622
+ "epoch": 0.2094240837696335,
623
+ "grad_norm": 10.015279769897461,
624
+ "learning_rate": 1.999304097654045e-05,
625
+ "loss": 0.9015,
626
+ "step": 160
627
+ },
628
+ {
629
+ "epoch": 0.2094240837696335,
630
+ "eval_F1_err_corr": 0.885087159946509,
631
+ "eval_accuracy": 0.9020572312451662,
632
+ "eval_correct_accuracy": 0.95333342698488,
633
+ "eval_error_accuracy": 0.8259592279571245,
634
+ "eval_f1": 0.5984271943176053,
635
+ "eval_loss": 0.24851758778095245,
636
+ "eval_pr_auc": 0.6675246054619536,
637
+ "eval_precision": 0.6804153446783963,
638
+ "eval_recall": 0.5340729001584786,
639
+ "eval_runtime": 24.8104,
640
+ "eval_samples_per_second": 197.216,
641
+ "eval_steps_per_second": 0.806,
642
+ "step": 160
643
+ },
644
+ {
645
+ "epoch": 0.21465968586387435,
646
+ "grad_norm": 14.583905220031738,
647
+ "learning_rate": 1.999171836956597e-05,
648
+ "loss": 0.9587,
649
+ "step": 164
650
+ },
651
+ {
652
+ "epoch": 0.2198952879581152,
653
+ "grad_norm": 9.168513298034668,
654
+ "learning_rate": 1.9990280819447662e-05,
655
+ "loss": 0.9663,
656
+ "step": 168
657
+ },
658
+ {
659
+ "epoch": 0.225130890052356,
660
+ "grad_norm": 24.278688430786133,
661
+ "learning_rate": 1.998872834272287e-05,
662
+ "loss": 0.9679,
663
+ "step": 172
664
+ },
665
+ {
666
+ "epoch": 0.23036649214659685,
667
+ "grad_norm": 23.693418502807617,
668
+ "learning_rate": 1.9987060957251047e-05,
669
+ "loss": 0.9541,
670
+ "step": 176
671
+ },
672
+ {
673
+ "epoch": 0.2356020942408377,
674
+ "grad_norm": 34.47703170776367,
675
+ "learning_rate": 1.9985278682213525e-05,
676
+ "loss": 0.8988,
677
+ "step": 180
678
+ },
679
+ {
680
+ "epoch": 0.24083769633507854,
681
+ "grad_norm": 17.93362045288086,
682
+ "learning_rate": 1.9983381538113317e-05,
683
+ "loss": 0.9296,
684
+ "step": 184
685
+ },
686
+ {
687
+ "epoch": 0.24607329842931938,
688
+ "grad_norm": 23.294275283813477,
689
+ "learning_rate": 1.998136954677487e-05,
690
+ "loss": 0.9337,
691
+ "step": 188
692
+ },
693
+ {
694
+ "epoch": 0.2513089005235602,
695
+ "grad_norm": 19.78593635559082,
696
+ "learning_rate": 1.9979242731343803e-05,
697
+ "loss": 0.8976,
698
+ "step": 192
699
+ },
700
+ {
701
+ "epoch": 0.25654450261780104,
702
+ "grad_norm": 16.300464630126953,
703
+ "learning_rate": 1.9977001116286675e-05,
704
+ "loss": 0.8705,
705
+ "step": 196
706
+ },
707
+ {
708
+ "epoch": 0.2617801047120419,
709
+ "grad_norm": 26.935935974121094,
710
+ "learning_rate": 1.9974644727390665e-05,
711
+ "loss": 0.8758,
712
+ "step": 200
713
+ },
714
+ {
715
+ "epoch": 0.2617801047120419,
716
+ "eval_F1_err_corr": 0.8910747356279248,
717
+ "eval_accuracy": 0.9052126836813612,
718
+ "eval_correct_accuracy": 0.9761037985940583,
719
+ "eval_error_accuracy": 0.819672508302841,
720
+ "eval_f1": 0.558119411595039,
721
+ "eval_loss": 0.24936090409755707,
722
+ "eval_pr_auc": 0.6830633725429478,
723
+ "eval_precision": 0.768772348033373,
724
+ "eval_recall": 0.4380801448947249,
725
+ "eval_runtime": 24.8593,
726
+ "eval_samples_per_second": 196.827,
727
+ "eval_steps_per_second": 0.805,
728
+ "step": 200
729
+ },
730
+ {
731
+ "epoch": 0.2670157068062827,
732
+ "grad_norm": 26.804174423217773,
733
+ "learning_rate": 1.9972173591763297e-05,
734
+ "loss": 0.9957,
735
+ "step": 204
736
+ },
737
+ {
738
+ "epoch": 0.27225130890052357,
739
+ "grad_norm": 12.255861282348633,
740
+ "learning_rate": 1.996958773783213e-05,
741
+ "loss": 0.8614,
742
+ "step": 208
743
+ },
744
+ {
745
+ "epoch": 0.2774869109947644,
746
+ "grad_norm": 10.577012062072754,
747
+ "learning_rate": 1.9966887195344403e-05,
748
+ "loss": 0.8539,
749
+ "step": 212
750
+ },
751
+ {
752
+ "epoch": 0.28272251308900526,
753
+ "grad_norm": 9.850268363952637,
754
+ "learning_rate": 1.9964071995366744e-05,
755
+ "loss": 0.8184,
756
+ "step": 216
757
+ },
758
+ {
759
+ "epoch": 0.2879581151832461,
760
+ "grad_norm": 4.022161960601807,
761
+ "learning_rate": 1.9961142170284762e-05,
762
+ "loss": 0.783,
763
+ "step": 220
764
+ },
765
+ {
766
+ "epoch": 0.2931937172774869,
767
+ "grad_norm": 4.174556732177734,
768
+ "learning_rate": 1.9958097753802693e-05,
769
+ "loss": 0.8355,
770
+ "step": 224
771
+ },
772
+ {
773
+ "epoch": 0.29842931937172773,
774
+ "grad_norm": 8.559288024902344,
775
+ "learning_rate": 1.9954938780943034e-05,
776
+ "loss": 0.8081,
777
+ "step": 228
778
+ },
779
+ {
780
+ "epoch": 0.3036649214659686,
781
+ "grad_norm": 11.881876945495605,
782
+ "learning_rate": 1.9951665288046098e-05,
783
+ "loss": 0.8846,
784
+ "step": 232
785
+ },
786
+ {
787
+ "epoch": 0.3089005235602094,
788
+ "grad_norm": 9.480097770690918,
789
+ "learning_rate": 1.994827731276963e-05,
790
+ "loss": 0.869,
791
+ "step": 236
792
+ },
793
+ {
794
+ "epoch": 0.31413612565445026,
795
+ "grad_norm": 18.96599006652832,
796
+ "learning_rate": 1.9944774894088367e-05,
797
+ "loss": 0.9044,
798
+ "step": 240
799
+ },
800
+ {
801
+ "epoch": 0.31413612565445026,
802
+ "eval_F1_err_corr": 0.8903583524392616,
803
+ "eval_accuracy": 0.8976334106728538,
804
+ "eval_correct_accuracy": 0.9422891260099501,
805
+ "eval_error_accuracy": 0.8438525462118894,
806
+ "eval_f1": 0.6341625207296849,
807
+ "eval_loss": 0.25486111640930176,
808
+ "eval_pr_auc": 0.6936322312463549,
809
+ "eval_precision": 0.6197061365600691,
810
+ "eval_recall": 0.6493094860765225,
811
+ "eval_runtime": 24.7931,
812
+ "eval_samples_per_second": 197.353,
813
+ "eval_steps_per_second": 0.807,
814
+ "step": 240
815
+ },
816
+ {
817
+ "epoch": 0.3193717277486911,
818
+ "grad_norm": 7.49755859375,
819
+ "learning_rate": 1.994115807229357e-05,
820
+ "loss": 0.8702,
821
+ "step": 244
822
+ },
823
+ {
824
+ "epoch": 0.32460732984293195,
825
+ "grad_norm": 19.93411636352539,
826
+ "learning_rate": 1.993742688899259e-05,
827
+ "loss": 0.8357,
828
+ "step": 248
829
+ },
830
+ {
831
+ "epoch": 0.3298429319371728,
832
+ "grad_norm": 18.435436248779297,
833
+ "learning_rate": 1.9933581387108358e-05,
834
+ "loss": 0.8185,
835
+ "step": 252
836
+ },
837
+ {
838
+ "epoch": 0.33507853403141363,
839
+ "grad_norm": 23.072092056274414,
840
+ "learning_rate": 1.992962161087893e-05,
841
+ "loss": 0.8371,
842
+ "step": 256
843
+ },
844
+ {
845
+ "epoch": 0.3403141361256545,
846
+ "grad_norm": 11.625171661376953,
847
+ "learning_rate": 1.9925547605856937e-05,
848
+ "loss": 0.8276,
849
+ "step": 260
850
+ },
851
+ {
852
+ "epoch": 0.34554973821989526,
853
+ "grad_norm": 18.671037673950195,
854
+ "learning_rate": 1.992135941890909e-05,
855
+ "loss": 0.8253,
856
+ "step": 264
857
+ },
858
+ {
859
+ "epoch": 0.3507853403141361,
860
+ "grad_norm": 15.393129348754883,
861
+ "learning_rate": 1.9917057098215624e-05,
862
+ "loss": 0.8245,
863
+ "step": 268
864
+ },
865
+ {
866
+ "epoch": 0.35602094240837695,
867
+ "grad_norm": 9.267082214355469,
868
+ "learning_rate": 1.9912640693269754e-05,
869
+ "loss": 0.8451,
870
+ "step": 272
871
+ },
872
+ {
873
+ "epoch": 0.3612565445026178,
874
+ "grad_norm": 5.4926252365112305,
875
+ "learning_rate": 1.9908110254877107e-05,
876
+ "loss": 0.813,
877
+ "step": 276
878
+ },
879
+ {
880
+ "epoch": 0.36649214659685864,
881
+ "grad_norm": 6.064371585845947,
882
+ "learning_rate": 1.9903465835155124e-05,
883
+ "loss": 0.7553,
884
+ "step": 280
885
+ },
886
+ {
887
+ "epoch": 0.36649214659685864,
888
+ "eval_F1_err_corr": 0.898106732050316,
889
+ "eval_accuracy": 0.9078112915699923,
890
+ "eval_correct_accuracy": 0.9649030769491357,
891
+ "eval_error_accuracy": 0.8399597119400094,
892
+ "eval_f1": 0.624117053481332,
893
+ "eval_loss": 0.23855358362197876,
894
+ "eval_pr_auc": 0.697922245841014,
895
+ "eval_precision": 0.704642551979493,
896
+ "eval_recall": 0.5601086710436948,
897
+ "eval_runtime": 24.8196,
898
+ "eval_samples_per_second": 197.143,
899
+ "eval_steps_per_second": 0.806,
900
+ "step": 280
901
+ },
902
+ {
903
+ "epoch": 0.3717277486910995,
904
+ "grad_norm": 11.443989753723145,
905
+ "learning_rate": 1.9898707487532475e-05,
906
+ "loss": 0.7992,
907
+ "step": 284
908
+ },
909
+ {
910
+ "epoch": 0.3769633507853403,
911
+ "grad_norm": 9.889354705810547,
912
+ "learning_rate": 1.9893835266748437e-05,
913
+ "loss": 0.8425,
914
+ "step": 288
915
+ },
916
+ {
917
+ "epoch": 0.38219895287958117,
918
+ "grad_norm": 6.687994480133057,
919
+ "learning_rate": 1.9888849228852262e-05,
920
+ "loss": 0.8465,
921
+ "step": 292
922
+ },
923
+ {
924
+ "epoch": 0.387434554973822,
925
+ "grad_norm": 3.455092430114746,
926
+ "learning_rate": 1.988374943120254e-05,
927
+ "loss": 0.8098,
928
+ "step": 296
929
+ },
930
+ {
931
+ "epoch": 0.39267015706806285,
932
+ "grad_norm": 4.258669376373291,
933
+ "learning_rate": 1.987853593246654e-05,
934
+ "loss": 0.8263,
935
+ "step": 300
936
+ },
937
+ {
938
+ "epoch": 0.39790575916230364,
939
+ "grad_norm": 5.940682888031006,
940
+ "learning_rate": 1.9873208792619517e-05,
941
+ "loss": 0.7651,
942
+ "step": 304
943
+ },
944
+ {
945
+ "epoch": 0.4031413612565445,
946
+ "grad_norm": 5.644289493560791,
947
+ "learning_rate": 1.9867768072944047e-05,
948
+ "loss": 0.7919,
949
+ "step": 308
950
+ },
951
+ {
952
+ "epoch": 0.4083769633507853,
953
+ "grad_norm": 6.426525115966797,
954
+ "learning_rate": 1.9862213836029308e-05,
955
+ "loss": 0.7661,
956
+ "step": 312
957
+ },
958
+ {
959
+ "epoch": 0.41361256544502617,
960
+ "grad_norm": 7.790468215942383,
961
+ "learning_rate": 1.985654614577036e-05,
962
+ "loss": 0.7592,
963
+ "step": 316
964
+ },
965
+ {
966
+ "epoch": 0.418848167539267,
967
+ "grad_norm": 8.240925788879395,
968
+ "learning_rate": 1.985076506736741e-05,
969
+ "loss": 0.7935,
970
+ "step": 320
971
+ },
972
+ {
973
+ "epoch": 0.418848167539267,
974
+ "eval_F1_err_corr": 0.8892707173263128,
975
+ "eval_accuracy": 0.900108275328693,
976
+ "eval_correct_accuracy": 0.9416031342860438,
977
+ "eval_error_accuracy": 0.8424490839609798,
978
+ "eval_f1": 0.636169014084507,
979
+ "eval_loss": 0.24991166591644287,
980
+ "eval_pr_auc": 0.6999774937080984,
981
+ "eval_precision": 0.6332436069986541,
982
+ "eval_recall": 0.6391215757301336,
983
+ "eval_runtime": 24.8123,
984
+ "eval_samples_per_second": 197.2,
985
+ "eval_steps_per_second": 0.806,
986
+ "step": 320
987
+ },
988
+ {
989
+ "epoch": 0.42408376963350786,
990
+ "grad_norm": 6.823334217071533,
991
+ "learning_rate": 1.9844870667325073e-05,
992
+ "loss": 0.7347,
993
+ "step": 324
994
+ },
995
+ {
996
+ "epoch": 0.4293193717277487,
997
+ "grad_norm": 4.039069175720215,
998
+ "learning_rate": 1.9838863013451587e-05,
999
+ "loss": 0.7886,
1000
+ "step": 328
1001
+ },
1002
+ {
1003
+ "epoch": 0.43455497382198954,
1004
+ "grad_norm": 7.6934380531311035,
1005
+ "learning_rate": 1.9832742174858052e-05,
1006
+ "loss": 0.7608,
1007
+ "step": 332
1008
+ },
1009
+ {
1010
+ "epoch": 0.4397905759162304,
1011
+ "grad_norm": 9.409914016723633,
1012
+ "learning_rate": 1.9826508221957624e-05,
1013
+ "loss": 0.7466,
1014
+ "step": 336
1015
+ },
1016
+ {
1017
+ "epoch": 0.44502617801047123,
1018
+ "grad_norm": 7.726130962371826,
1019
+ "learning_rate": 1.9820161226464708e-05,
1020
+ "loss": 0.7023,
1021
+ "step": 340
1022
+ },
1023
+ {
1024
+ "epoch": 0.450261780104712,
1025
+ "grad_norm": 3.726100206375122,
1026
+ "learning_rate": 1.9813701261394136e-05,
1027
+ "loss": 0.7078,
1028
+ "step": 344
1029
+ },
1030
+ {
1031
+ "epoch": 0.45549738219895286,
1032
+ "grad_norm": 12.017361640930176,
1033
+ "learning_rate": 1.980712840106032e-05,
1034
+ "loss": 0.7383,
1035
+ "step": 348
1036
+ },
1037
+ {
1038
+ "epoch": 0.4607329842931937,
1039
+ "grad_norm": 5.709269046783447,
1040
+ "learning_rate": 1.9800442721076406e-05,
1041
+ "loss": 0.7215,
1042
+ "step": 352
1043
+ },
1044
+ {
1045
+ "epoch": 0.46596858638743455,
1046
+ "grad_norm": 12.649430274963379,
1047
+ "learning_rate": 1.979364429835339e-05,
1048
+ "loss": 0.7111,
1049
+ "step": 356
1050
+ },
1051
+ {
1052
+ "epoch": 0.4712041884816754,
1053
+ "grad_norm": 16.15489959716797,
1054
+ "learning_rate": 1.9786733211099257e-05,
1055
+ "loss": 0.7764,
1056
+ "step": 360
1057
+ },
1058
+ {
1059
+ "epoch": 0.4712041884816754,
1060
+ "eval_F1_err_corr": 0.894511960241892,
1061
+ "eval_accuracy": 0.9100077339520495,
1062
+ "eval_correct_accuracy": 0.9712793351142024,
1063
+ "eval_error_accuracy": 0.8289907059644579,
1064
+ "eval_f1": 0.5971472095277662,
1065
+ "eval_loss": 0.2414369434118271,
1066
+ "eval_pr_auc": 0.7108638111158798,
1067
+ "eval_precision": 0.7689015691868759,
1068
+ "eval_recall": 0.48811410459587956,
1069
+ "eval_runtime": 25.0196,
1070
+ "eval_samples_per_second": 195.567,
1071
+ "eval_steps_per_second": 0.799,
1072
+ "step": 360
1073
+ },
1074
+ {
1075
+ "epoch": 0.47643979057591623,
1076
+ "grad_norm": 12.530599594116211,
1077
+ "learning_rate": 1.9779709538818052e-05,
1078
+ "loss": 0.7715,
1079
+ "step": 364
1080
+ },
1081
+ {
1082
+ "epoch": 0.4816753926701571,
1083
+ "grad_norm": 6.7939605712890625,
1084
+ "learning_rate": 1.9772573362308992e-05,
1085
+ "loss": 0.7522,
1086
+ "step": 368
1087
+ },
1088
+ {
1089
+ "epoch": 0.4869109947643979,
1090
+ "grad_norm": 3.4304537773132324,
1091
+ "learning_rate": 1.9765324763665516e-05,
1092
+ "loss": 0.7511,
1093
+ "step": 372
1094
+ },
1095
+ {
1096
+ "epoch": 0.49214659685863876,
1097
+ "grad_norm": 6.636844158172607,
1098
+ "learning_rate": 1.9757963826274357e-05,
1099
+ "loss": 0.7121,
1100
+ "step": 376
1101
+ },
1102
+ {
1103
+ "epoch": 0.4973821989528796,
1104
+ "grad_norm": 4.51839017868042,
1105
+ "learning_rate": 1.975049063481457e-05,
1106
+ "loss": 0.7231,
1107
+ "step": 380
1108
+ },
1109
+ {
1110
+ "epoch": 0.5026178010471204,
1111
+ "grad_norm": 9.865214347839355,
1112
+ "learning_rate": 1.974290527525657e-05,
1113
+ "loss": 0.762,
1114
+ "step": 384
1115
+ },
1116
+ {
1117
+ "epoch": 0.5078534031413613,
1118
+ "grad_norm": 3.440359592437744,
1119
+ "learning_rate": 1.9735207834861117e-05,
1120
+ "loss": 0.7169,
1121
+ "step": 388
1122
+ },
1123
+ {
1124
+ "epoch": 0.5130890052356021,
1125
+ "grad_norm": 3.5312769412994385,
1126
+ "learning_rate": 1.972739840217836e-05,
1127
+ "loss": 0.73,
1128
+ "step": 392
1129
+ },
1130
+ {
1131
+ "epoch": 0.518324607329843,
1132
+ "grad_norm": 4.723533630371094,
1133
+ "learning_rate": 1.9719477067046768e-05,
1134
+ "loss": 0.6783,
1135
+ "step": 396
1136
+ },
1137
+ {
1138
+ "epoch": 0.5235602094240838,
1139
+ "grad_norm": 3.5356740951538086,
1140
+ "learning_rate": 1.971144392059212e-05,
1141
+ "loss": 0.7155,
1142
+ "step": 400
1143
+ },
1144
+ {
1145
+ "epoch": 0.5235602094240838,
1146
+ "eval_F1_err_corr": 0.893120798984817,
1147
+ "eval_accuracy": 0.902954369682908,
1148
+ "eval_correct_accuracy": 0.9461320280124133,
1149
+ "eval_error_accuracy": 0.8457347701138861,
1150
+ "eval_f1": 0.639051892762628,
1151
+ "eval_loss": 0.24243153631687164,
1152
+ "eval_pr_auc": 0.7029855391245526,
1153
+ "eval_precision": 0.6497426298549368,
1154
+ "eval_recall": 0.6287072673760471,
1155
+ "eval_runtime": 24.8233,
1156
+ "eval_samples_per_second": 197.113,
1157
+ "eval_steps_per_second": 0.806,
1158
+ "step": 400
1159
+ },
1160
+ {
1161
+ "epoch": 0.5287958115183246,
1162
+ "grad_norm": 13.087606430053711,
1163
+ "learning_rate": 1.970329905522647e-05,
1164
+ "loss": 0.7007,
1165
+ "step": 404
1166
+ },
1167
+ {
1168
+ "epoch": 0.5340314136125655,
1169
+ "grad_norm": 14.260698318481445,
1170
+ "learning_rate": 1.9695042564647045e-05,
1171
+ "loss": 0.6817,
1172
+ "step": 408
1173
+ },
1174
+ {
1175
+ "epoch": 0.5392670157068062,
1176
+ "grad_norm": 9.661425590515137,
1177
+ "learning_rate": 1.9686674543835208e-05,
1178
+ "loss": 0.7358,
1179
+ "step": 412
1180
+ },
1181
+ {
1182
+ "epoch": 0.5445026178010471,
1183
+ "grad_norm": 5.698840618133545,
1184
+ "learning_rate": 1.9678195089055347e-05,
1185
+ "loss": 0.6646,
1186
+ "step": 416
1187
+ },
1188
+ {
1189
+ "epoch": 0.5497382198952879,
1190
+ "grad_norm": 5.9759907722473145,
1191
+ "learning_rate": 1.9669604297853766e-05,
1192
+ "loss": 0.73,
1193
+ "step": 420
1194
+ },
1195
+ {
1196
+ "epoch": 0.5549738219895288,
1197
+ "grad_norm": 4.276744842529297,
1198
+ "learning_rate": 1.9660902269057558e-05,
1199
+ "loss": 0.712,
1200
+ "step": 424
1201
+ },
1202
+ {
1203
+ "epoch": 0.5602094240837696,
1204
+ "grad_norm": 4.572305679321289,
1205
+ "learning_rate": 1.9652089102773487e-05,
1206
+ "loss": 0.7033,
1207
+ "step": 428
1208
+ },
1209
+ {
1210
+ "epoch": 0.5654450261780105,
1211
+ "grad_norm": 3.9941539764404297,
1212
+ "learning_rate": 1.9643164900386824e-05,
1213
+ "loss": 0.6695,
1214
+ "step": 432
1215
+ },
1216
+ {
1217
+ "epoch": 0.5706806282722513,
1218
+ "grad_norm": 4.321977138519287,
1219
+ "learning_rate": 1.963412976456017e-05,
1220
+ "loss": 0.709,
1221
+ "step": 436
1222
+ },
1223
+ {
1224
+ "epoch": 0.5759162303664922,
1225
+ "grad_norm": 4.374669551849365,
1226
+ "learning_rate": 1.96249837992323e-05,
1227
+ "loss": 0.6815,
1228
+ "step": 440
1229
+ },
1230
+ {
1231
+ "epoch": 0.5759162303664922,
1232
+ "eval_F1_err_corr": 0.8937597915811933,
1233
+ "eval_accuracy": 0.9036968290796598,
1234
+ "eval_correct_accuracy": 0.9500814005540427,
1235
+ "eval_error_accuracy": 0.8437420660571459,
1236
+ "eval_f1": 0.6368832380730199,
1237
+ "eval_loss": 0.24286404252052307,
1238
+ "eval_pr_auc": 0.7035206327309997,
1239
+ "eval_precision": 0.6568816169393648,
1240
+ "eval_recall": 0.618066561014263,
1241
+ "eval_runtime": 24.8231,
1242
+ "eval_samples_per_second": 197.115,
1243
+ "eval_steps_per_second": 0.806,
1244
+ "step": 440
1245
+ },
1246
+ {
1247
+ "epoch": 0.581151832460733,
1248
+ "grad_norm": 3.3900415897369385,
1249
+ "learning_rate": 1.961572710961695e-05,
1250
+ "loss": 0.6042,
1251
+ "step": 444
1252
+ },
1253
+ {
1254
+ "epoch": 0.5863874345549738,
1255
+ "grad_norm": 3.9020636081695557,
1256
+ "learning_rate": 1.9606359802201608e-05,
1257
+ "loss": 0.6541,
1258
+ "step": 448
1259
+ },
1260
+ {
1261
+ "epoch": 0.5916230366492147,
1262
+ "grad_norm": 3.2324304580688477,
1263
+ "learning_rate": 1.9596881984746288e-05,
1264
+ "loss": 0.664,
1265
+ "step": 452
1266
+ },
1267
+ {
1268
+ "epoch": 0.5968586387434555,
1269
+ "grad_norm": 3.6972060203552246,
1270
+ "learning_rate": 1.958729376628231e-05,
1271
+ "loss": 0.6325,
1272
+ "step": 456
1273
+ },
1274
+ {
1275
+ "epoch": 0.6020942408376964,
1276
+ "grad_norm": 4.679067134857178,
1277
+ "learning_rate": 1.957759525711101e-05,
1278
+ "loss": 0.6851,
1279
+ "step": 460
1280
+ },
1281
+ {
1282
+ "epoch": 0.6073298429319371,
1283
+ "grad_norm": 6.575286865234375,
1284
+ "learning_rate": 1.9567786568802503e-05,
1285
+ "loss": 0.6266,
1286
+ "step": 464
1287
+ },
1288
+ {
1289
+ "epoch": 0.612565445026178,
1290
+ "grad_norm": 6.148586273193359,
1291
+ "learning_rate": 1.9557867814194385e-05,
1292
+ "loss": 0.6887,
1293
+ "step": 468
1294
+ },
1295
+ {
1296
+ "epoch": 0.6178010471204188,
1297
+ "grad_norm": 3.9649710655212402,
1298
+ "learning_rate": 1.9547839107390435e-05,
1299
+ "loss": 0.6448,
1300
+ "step": 472
1301
+ },
1302
+ {
1303
+ "epoch": 0.6230366492146597,
1304
+ "grad_norm": 3.5095326900482178,
1305
+ "learning_rate": 1.9537700563759303e-05,
1306
+ "loss": 0.6793,
1307
+ "step": 476
1308
+ },
1309
+ {
1310
+ "epoch": 0.6282722513089005,
1311
+ "grad_norm": 5.709955215454102,
1312
+ "learning_rate": 1.9527452299933192e-05,
1313
+ "loss": 0.6321,
1314
+ "step": 480
1315
+ },
1316
+ {
1317
+ "epoch": 0.6282722513089005,
1318
+ "eval_F1_err_corr": 0.8922176723044,
1319
+ "eval_accuracy": 0.8975096674400619,
1320
+ "eval_correct_accuracy": 0.9449689114373253,
1321
+ "eval_error_accuracy": 0.8450445368681248,
1322
+ "eval_f1": 0.6403994355801584,
1323
+ "eval_loss": 0.25328728556632996,
1324
+ "eval_pr_auc": 0.6997538853349474,
1325
+ "eval_precision": 0.6150959132610508,
1326
+ "eval_recall": 0.6678741227077202,
1327
+ "eval_runtime": 24.8167,
1328
+ "eval_samples_per_second": 197.166,
1329
+ "eval_steps_per_second": 0.806,
1330
+ "step": 480
1331
+ },
1332
+ {
1333
+ "epoch": 0.6335078534031413,
1334
+ "grad_norm": 3.6896157264709473,
1335
+ "learning_rate": 1.95170944338065e-05,
1336
+ "loss": 0.6806,
1337
+ "step": 484
1338
+ },
1339
+ {
1340
+ "epoch": 0.6387434554973822,
1341
+ "grad_norm": 4.03073263168335,
1342
+ "learning_rate": 1.9506627084534486e-05,
1343
+ "loss": 0.6133,
1344
+ "step": 488
1345
+ },
1346
+ {
1347
+ "epoch": 0.643979057591623,
1348
+ "grad_norm": 6.4314751625061035,
1349
+ "learning_rate": 1.9496050372531864e-05,
1350
+ "loss": 0.6098,
1351
+ "step": 492
1352
+ },
1353
+ {
1354
+ "epoch": 0.6492146596858639,
1355
+ "grad_norm": 3.8455100059509277,
1356
+ "learning_rate": 1.9485364419471454e-05,
1357
+ "loss": 0.6306,
1358
+ "step": 496
1359
+ },
1360
+ {
1361
+ "epoch": 0.6544502617801047,
1362
+ "grad_norm": 3.8784000873565674,
1363
+ "learning_rate": 1.9474569348282774e-05,
1364
+ "loss": 0.6104,
1365
+ "step": 500
1366
+ },
1367
+ {
1368
+ "epoch": 0.6596858638743456,
1369
+ "grad_norm": 5.018595218658447,
1370
+ "learning_rate": 1.9463665283150604e-05,
1371
+ "loss": 0.6592,
1372
+ "step": 504
1373
+ },
1374
+ {
1375
+ "epoch": 0.6649214659685864,
1376
+ "grad_norm": 3.5282726287841797,
1377
+ "learning_rate": 1.9452652349513587e-05,
1378
+ "loss": 0.621,
1379
+ "step": 508
1380
+ },
1381
+ {
1382
+ "epoch": 0.6701570680628273,
1383
+ "grad_norm": 3.4036905765533447,
1384
+ "learning_rate": 1.9441530674062754e-05,
1385
+ "loss": 0.6744,
1386
+ "step": 512
1387
+ },
1388
+ {
1389
+ "epoch": 0.675392670157068,
1390
+ "grad_norm": 4.95082950592041,
1391
+ "learning_rate": 1.9430300384740108e-05,
1392
+ "loss": 0.5925,
1393
+ "step": 516
1394
+ },
1395
+ {
1396
+ "epoch": 0.680628272251309,
1397
+ "grad_norm": 5.078342437744141,
1398
+ "learning_rate": 1.941896161073711e-05,
1399
+ "loss": 0.5913,
1400
+ "step": 520
1401
+ },
1402
+ {
1403
+ "epoch": 0.680628272251309,
1404
+ "eval_F1_err_corr": 0.885156181305656,
1405
+ "eval_accuracy": 0.8942304717710751,
1406
+ "eval_correct_accuracy": 0.9306883336673133,
1407
+ "eval_error_accuracy": 0.8438713827505521,
1408
+ "eval_f1": 0.6393079438759363,
1409
+ "eval_loss": 0.27150195837020874,
1410
+ "eval_pr_auc": 0.6992222071782436,
1411
+ "eval_precision": 0.5985776372975109,
1412
+ "eval_recall": 0.6859859633235228,
1413
+ "eval_runtime": 24.819,
1414
+ "eval_samples_per_second": 197.147,
1415
+ "eval_steps_per_second": 0.806,
1416
+ "step": 520
1417
+ },
1418
+ {
1419
+ "epoch": 0.6858638743455497,
1420
+ "grad_norm": 5.81033182144165,
1421
+ "learning_rate": 1.9407514482493214e-05,
1422
+ "loss": 0.6133,
1423
+ "step": 524
1424
+ },
1425
+ {
1426
+ "epoch": 0.6910994764397905,
1427
+ "grad_norm": 4.901327133178711,
1428
+ "learning_rate": 1.939595913169438e-05,
1429
+ "loss": 0.6121,
1430
+ "step": 528
1431
+ },
1432
+ {
1433
+ "epoch": 0.6963350785340314,
1434
+ "grad_norm": 3.7869937419891357,
1435
+ "learning_rate": 1.9384295691271523e-05,
1436
+ "loss": 0.5822,
1437
+ "step": 532
1438
+ },
1439
+ {
1440
+ "epoch": 0.7015706806282722,
1441
+ "grad_norm": 3.8648629188537598,
1442
+ "learning_rate": 1.9372524295399014e-05,
1443
+ "loss": 0.6032,
1444
+ "step": 536
1445
+ },
1446
+ {
1447
+ "epoch": 0.7068062827225131,
1448
+ "grad_norm": 3.9610342979431152,
1449
+ "learning_rate": 1.9360645079493126e-05,
1450
+ "loss": 0.59,
1451
+ "step": 540
1452
+ },
1453
+ {
1454
+ "epoch": 0.7120418848167539,
1455
+ "grad_norm": 5.623746395111084,
1456
+ "learning_rate": 1.9348658180210473e-05,
1457
+ "loss": 0.5835,
1458
+ "step": 544
1459
+ },
1460
+ {
1461
+ "epoch": 0.7172774869109948,
1462
+ "grad_norm": 6.02370548248291,
1463
+ "learning_rate": 1.933656373544645e-05,
1464
+ "loss": 0.6003,
1465
+ "step": 548
1466
+ },
1467
+ {
1468
+ "epoch": 0.7225130890052356,
1469
+ "grad_norm": 5.652750492095947,
1470
+ "learning_rate": 1.932436188433362e-05,
1471
+ "loss": 0.5958,
1472
+ "step": 552
1473
+ },
1474
+ {
1475
+ "epoch": 0.7277486910994765,
1476
+ "grad_norm": 7.355208396911621,
1477
+ "learning_rate": 1.9312052767240153e-05,
1478
+ "loss": 0.5677,
1479
+ "step": 556
1480
+ },
1481
+ {
1482
+ "epoch": 0.7329842931937173,
1483
+ "grad_norm": 4.652146339416504,
1484
+ "learning_rate": 1.9299636525768176e-05,
1485
+ "loss": 0.5649,
1486
+ "step": 560
1487
+ },
1488
+ {
1489
+ "epoch": 0.7329842931937173,
1490
+ "eval_F1_err_corr": 0.8974946334360716,
1491
+ "eval_accuracy": 0.9049033255993812,
1492
+ "eval_correct_accuracy": 0.9592731998252757,
1493
+ "eval_error_accuracy": 0.843191870706177,
1494
+ "eval_f1": 0.6410555815039701,
1495
+ "eval_loss": 0.24959486722946167,
1496
+ "eval_pr_auc": 0.6979561382710899,
1497
+ "eval_precision": 0.6619242826139378,
1498
+ "eval_recall": 0.621462531129726,
1499
+ "eval_runtime": 24.817,
1500
+ "eval_samples_per_second": 197.163,
1501
+ "eval_steps_per_second": 0.806,
1502
+ "step": 560
1503
+ },
1504
+ {
1505
+ "epoch": 0.7382198952879581,
1506
+ "grad_norm": 5.073575019836426,
1507
+ "learning_rate": 1.9287113302752167e-05,
1508
+ "loss": 0.5491,
1509
+ "step": 564
1510
+ },
1511
+ {
1512
+ "epoch": 0.743455497382199,
1513
+ "grad_norm": 4.796985149383545,
1514
+ "learning_rate": 1.927448324225729e-05,
1515
+ "loss": 0.5849,
1516
+ "step": 568
1517
+ },
1518
+ {
1519
+ "epoch": 0.7486910994764397,
1520
+ "grad_norm": 6.055835247039795,
1521
+ "learning_rate": 1.9261746489577767e-05,
1522
+ "loss": 0.5721,
1523
+ "step": 572
1524
+ },
1525
+ {
1526
+ "epoch": 0.7539267015706806,
1527
+ "grad_norm": 7.7210893630981445,
1528
+ "learning_rate": 1.9248903191235177e-05,
1529
+ "loss": 0.5749,
1530
+ "step": 576
1531
+ },
1532
+ {
1533
+ "epoch": 0.7591623036649214,
1534
+ "grad_norm": 3.5172553062438965,
1535
+ "learning_rate": 1.9235953494976786e-05,
1536
+ "loss": 0.6009,
1537
+ "step": 580
1538
+ },
1539
+ {
1540
+ "epoch": 0.7643979057591623,
1541
+ "grad_norm": 5.326947212219238,
1542
+ "learning_rate": 1.922289754977385e-05,
1543
+ "loss": 0.5896,
1544
+ "step": 584
1545
+ },
1546
+ {
1547
+ "epoch": 0.7696335078534031,
1548
+ "grad_norm": 3.990248203277588,
1549
+ "learning_rate": 1.920973550581989e-05,
1550
+ "loss": 0.578,
1551
+ "step": 588
1552
+ },
1553
+ {
1554
+ "epoch": 0.774869109947644,
1555
+ "grad_norm": 3.6598334312438965,
1556
+ "learning_rate": 1.9196467514528973e-05,
1557
+ "loss": 0.567,
1558
+ "step": 592
1559
+ },
1560
+ {
1561
+ "epoch": 0.7801047120418848,
1562
+ "grad_norm": 5.096114635467529,
1563
+ "learning_rate": 1.9183093728533966e-05,
1564
+ "loss": 0.5847,
1565
+ "step": 596
1566
+ },
1567
+ {
1568
+ "epoch": 0.7853403141361257,
1569
+ "grad_norm": 5.4809889793396,
1570
+ "learning_rate": 1.9169614301684786e-05,
1571
+ "loss": 0.5934,
1572
+ "step": 600
1573
+ },
1574
+ {
1575
+ "epoch": 0.7853403141361257,
1576
+ "eval_F1_err_corr": 0.8959803504098618,
1577
+ "eval_accuracy": 0.9018097447795823,
1578
+ "eval_correct_accuracy": 0.9504131731842577,
1579
+ "eval_error_accuracy": 0.8474448138009186,
1580
+ "eval_f1": 0.6463115667483842,
1581
+ "eval_loss": 0.2541360855102539,
1582
+ "eval_pr_auc": 0.7031337927296945,
1583
+ "eval_precision": 0.6363835856923414,
1584
+ "eval_recall": 0.6565542223228436,
1585
+ "eval_runtime": 24.8027,
1586
+ "eval_samples_per_second": 197.277,
1587
+ "eval_steps_per_second": 0.806,
1588
+ "step": 600
1589
+ },
1590
+ {
1591
+ "epoch": 0.7905759162303665,
1592
+ "grad_norm": 3.492452621459961,
1593
+ "learning_rate": 1.915602938904662e-05,
1594
+ "loss": 0.5974,
1595
+ "step": 604
1596
+ },
1597
+ {
1598
+ "epoch": 0.7958115183246073,
1599
+ "grad_norm": 4.485317707061768,
1600
+ "learning_rate": 1.914233914689815e-05,
1601
+ "loss": 0.5269,
1602
+ "step": 608
1603
+ },
1604
+ {
1605
+ "epoch": 0.8010471204188482,
1606
+ "grad_norm": 4.36208438873291,
1607
+ "learning_rate": 1.912854373272975e-05,
1608
+ "loss": 0.5794,
1609
+ "step": 612
1610
+ },
1611
+ {
1612
+ "epoch": 0.806282722513089,
1613
+ "grad_norm": 4.126212120056152,
1614
+ "learning_rate": 1.9114643305241678e-05,
1615
+ "loss": 0.5454,
1616
+ "step": 616
1617
+ },
1618
+ {
1619
+ "epoch": 0.8115183246073299,
1620
+ "grad_norm": 3.9140942096710205,
1621
+ "learning_rate": 1.9100638024342245e-05,
1622
+ "loss": 0.5615,
1623
+ "step": 620
1624
+ },
1625
+ {
1626
+ "epoch": 0.8167539267015707,
1627
+ "grad_norm": 9.218249320983887,
1628
+ "learning_rate": 1.908652805114598e-05,
1629
+ "loss": 0.564,
1630
+ "step": 624
1631
+ },
1632
+ {
1633
+ "epoch": 0.8219895287958116,
1634
+ "grad_norm": 4.118100166320801,
1635
+ "learning_rate": 1.907231354797179e-05,
1636
+ "loss": 0.5406,
1637
+ "step": 628
1638
+ },
1639
+ {
1640
+ "epoch": 0.8272251308900523,
1641
+ "grad_norm": 3.917045831680298,
1642
+ "learning_rate": 1.9057994678341053e-05,
1643
+ "loss": 0.5581,
1644
+ "step": 632
1645
+ },
1646
+ {
1647
+ "epoch": 0.8324607329842932,
1648
+ "grad_norm": 4.272670745849609,
1649
+ "learning_rate": 1.9043571606975776e-05,
1650
+ "loss": 0.5761,
1651
+ "step": 636
1652
+ },
1653
+ {
1654
+ "epoch": 0.837696335078534,
1655
+ "grad_norm": 4.809320449829102,
1656
+ "learning_rate": 1.902904449979669e-05,
1657
+ "loss": 0.5422,
1658
+ "step": 640
1659
+ },
1660
+ {
1661
+ "epoch": 0.837696335078534,
1662
+ "eval_F1_err_corr": 0.899383774542208,
1663
+ "eval_accuracy": 0.905769528228925,
1664
+ "eval_correct_accuracy": 0.9610494803595725,
1665
+ "eval_error_accuracy": 0.8451544680769811,
1666
+ "eval_f1": 0.6363419293218721,
1667
+ "eval_loss": 0.2484092116355896,
1668
+ "eval_pr_auc": 0.6976824941932482,
1669
+ "eval_precision": 0.673149785299318,
1670
+ "eval_recall": 0.6033506905139234,
1671
+ "eval_runtime": 24.8065,
1672
+ "eval_samples_per_second": 197.247,
1673
+ "eval_steps_per_second": 0.806,
1674
+ "step": 640
1675
+ },
1676
+ {
1677
+ "epoch": 0.8429319371727748,
1678
+ "grad_norm": 5.909646511077881,
1679
+ "learning_rate": 1.901441352392133e-05,
1680
+ "loss": 0.5825,
1681
+ "step": 644
1682
+ },
1683
+ {
1684
+ "epoch": 0.8481675392670157,
1685
+ "grad_norm": 4.255792140960693,
1686
+ "learning_rate": 1.8999678847662124e-05,
1687
+ "loss": 0.5576,
1688
+ "step": 648
1689
+ },
1690
+ {
1691
+ "epoch": 0.8534031413612565,
1692
+ "grad_norm": 6.5200114250183105,
1693
+ "learning_rate": 1.8984840640524445e-05,
1694
+ "loss": 0.5296,
1695
+ "step": 652
1696
+ },
1697
+ {
1698
+ "epoch": 0.8586387434554974,
1699
+ "grad_norm": 8.32865047454834,
1700
+ "learning_rate": 1.8969899073204687e-05,
1701
+ "loss": 0.5655,
1702
+ "step": 656
1703
+ },
1704
+ {
1705
+ "epoch": 0.8638743455497382,
1706
+ "grad_norm": 9.28367805480957,
1707
+ "learning_rate": 1.8954854317588262e-05,
1708
+ "loss": 0.5791,
1709
+ "step": 660
1710
+ },
1711
+ {
1712
+ "epoch": 0.8691099476439791,
1713
+ "grad_norm": 4.166441917419434,
1714
+ "learning_rate": 1.8939706546747656e-05,
1715
+ "loss": 0.5214,
1716
+ "step": 664
1717
+ },
1718
+ {
1719
+ "epoch": 0.8743455497382199,
1720
+ "grad_norm": 3.7278671264648438,
1721
+ "learning_rate": 1.8924455934940424e-05,
1722
+ "loss": 0.5087,
1723
+ "step": 668
1724
+ },
1725
+ {
1726
+ "epoch": 0.8795811518324608,
1727
+ "grad_norm": 6.253541469573975,
1728
+ "learning_rate": 1.8909102657607182e-05,
1729
+ "loss": 0.5476,
1730
+ "step": 672
1731
+ },
1732
+ {
1733
+ "epoch": 0.8848167539267016,
1734
+ "grad_norm": 9.273209571838379,
1735
+ "learning_rate": 1.88936468913696e-05,
1736
+ "loss": 0.4928,
1737
+ "step": 676
1738
+ },
1739
+ {
1740
+ "epoch": 0.8900523560209425,
1741
+ "grad_norm": 5.4465532302856445,
1742
+ "learning_rate": 1.8878088814028365e-05,
1743
+ "loss": 0.4909,
1744
+ "step": 680
1745
+ },
1746
+ {
1747
+ "epoch": 0.8900523560209425,
1748
+ "eval_F1_err_corr": 0.8973571707111299,
1749
+ "eval_accuracy": 0.9004485692188708,
1750
+ "eval_correct_accuracy": 0.9515640305646176,
1751
+ "eval_error_accuracy": 0.8489933585798806,
1752
+ "eval_f1": 0.6449691085613416,
1753
+ "eval_loss": 0.25420647859573364,
1754
+ "eval_pr_auc": 0.7006737583541583,
1755
+ "eval_precision": 0.6290079621261029,
1756
+ "eval_recall": 0.6617613764998868,
1757
+ "eval_runtime": 24.8354,
1758
+ "eval_samples_per_second": 197.017,
1759
+ "eval_steps_per_second": 0.805,
1760
+ "step": 680
1761
+ },
1762
+ {
1763
+ "epoch": 0.8952879581151832,
1764
+ "grad_norm": 3.929280996322632,
1765
+ "learning_rate": 1.886242860456113e-05,
1766
+ "loss": 0.518,
1767
+ "step": 684
1768
+ },
1769
+ {
1770
+ "epoch": 0.900523560209424,
1771
+ "grad_norm": 3.3221724033355713,
1772
+ "learning_rate": 1.884666644312046e-05,
1773
+ "loss": 0.474,
1774
+ "step": 688
1775
+ },
1776
+ {
1777
+ "epoch": 0.9057591623036649,
1778
+ "grad_norm": 4.1775126457214355,
1779
+ "learning_rate": 1.8830802511031763e-05,
1780
+ "loss": 0.513,
1781
+ "step": 692
1782
+ },
1783
+ {
1784
+ "epoch": 0.9109947643979057,
1785
+ "grad_norm": 4.372125148773193,
1786
+ "learning_rate": 1.88148369907912e-05,
1787
+ "loss": 0.4958,
1788
+ "step": 696
1789
+ },
1790
+ {
1791
+ "epoch": 0.9162303664921466,
1792
+ "grad_norm": 4.19729471206665,
1793
+ "learning_rate": 1.8798770066063577e-05,
1794
+ "loss": 0.5178,
1795
+ "step": 700
1796
+ },
1797
+ {
1798
+ "epoch": 0.9214659685863874,
1799
+ "grad_norm": 4.332755088806152,
1800
+ "learning_rate": 1.8782601921680258e-05,
1801
+ "loss": 0.525,
1802
+ "step": 704
1803
+ },
1804
+ {
1805
+ "epoch": 0.9267015706806283,
1806
+ "grad_norm": 4.065849304199219,
1807
+ "learning_rate": 1.8766332743637002e-05,
1808
+ "loss": 0.4692,
1809
+ "step": 708
1810
+ },
1811
+ {
1812
+ "epoch": 0.9319371727748691,
1813
+ "grad_norm": 4.974046230316162,
1814
+ "learning_rate": 1.8749962719091864e-05,
1815
+ "loss": 0.4973,
1816
+ "step": 712
1817
+ },
1818
+ {
1819
+ "epoch": 0.93717277486911,
1820
+ "grad_norm": 4.961699962615967,
1821
+ "learning_rate": 1.8733492036363007e-05,
1822
+ "loss": 0.5204,
1823
+ "step": 716
1824
+ },
1825
+ {
1826
+ "epoch": 0.9424083769633508,
1827
+ "grad_norm": 4.140364646911621,
1828
+ "learning_rate": 1.871692088492655e-05,
1829
+ "loss": 0.4905,
1830
+ "step": 720
1831
+ },
1832
+ {
1833
+ "epoch": 0.9424083769633508,
1834
+ "eval_F1_err_corr": 0.8932916712717729,
1835
+ "eval_accuracy": 0.8947254447022428,
1836
+ "eval_correct_accuracy": 0.9452793616476387,
1837
+ "eval_error_accuracy": 0.8467242340670772,
1838
+ "eval_f1": 0.6396272371068517,
1839
+ "eval_loss": 0.2594238817691803,
1840
+ "eval_pr_auc": 0.7027911559368634,
1841
+ "eval_precision": 0.6008754476721051,
1842
+ "eval_recall": 0.6837219832465474,
1843
+ "eval_runtime": 24.8417,
1844
+ "eval_samples_per_second": 196.967,
1845
+ "eval_steps_per_second": 0.805,
1846
+ "step": 720
1847
+ },
1848
+ {
1849
+ "epoch": 0.9476439790575916,
1850
+ "grad_norm": 8.625274658203125,
1851
+ "learning_rate": 1.8700249455414394e-05,
1852
+ "loss": 0.4686,
1853
+ "step": 724
1854
+ },
1855
+ {
1856
+ "epoch": 0.9528795811518325,
1857
+ "grad_norm": 6.383296966552734,
1858
+ "learning_rate": 1.8683477939612024e-05,
1859
+ "loss": 0.4764,
1860
+ "step": 728
1861
+ },
1862
+ {
1863
+ "epoch": 0.9581151832460733,
1864
+ "grad_norm": 7.345070838928223,
1865
+ "learning_rate": 1.866660653045629e-05,
1866
+ "loss": 0.4823,
1867
+ "step": 732
1868
+ },
1869
+ {
1870
+ "epoch": 0.9633507853403142,
1871
+ "grad_norm": 4.40362548828125,
1872
+ "learning_rate": 1.8649635422033218e-05,
1873
+ "loss": 0.49,
1874
+ "step": 736
1875
+ },
1876
+ {
1877
+ "epoch": 0.9685863874345549,
1878
+ "grad_norm": 3.8177592754364014,
1879
+ "learning_rate": 1.863256480957574e-05,
1880
+ "loss": 0.5004,
1881
+ "step": 740
1882
+ },
1883
+ {
1884
+ "epoch": 0.9738219895287958,
1885
+ "grad_norm": 3.5552761554718018,
1886
+ "learning_rate": 1.861539488946148e-05,
1887
+ "loss": 0.4967,
1888
+ "step": 744
1889
+ },
1890
+ {
1891
+ "epoch": 0.9790575916230366,
1892
+ "grad_norm": 3.948543071746826,
1893
+ "learning_rate": 1.8598125859210475e-05,
1894
+ "loss": 0.5106,
1895
+ "step": 748
1896
+ },
1897
+ {
1898
+ "epoch": 0.9842931937172775,
1899
+ "grad_norm": 4.415132999420166,
1900
+ "learning_rate": 1.858075791748291e-05,
1901
+ "loss": 0.4919,
1902
+ "step": 752
1903
+ },
1904
+ {
1905
+ "epoch": 0.9895287958115183,
1906
+ "grad_norm": 4.514105319976807,
1907
+ "learning_rate": 1.8563291264076834e-05,
1908
+ "loss": 0.4947,
1909
+ "step": 756
1910
+ },
1911
+ {
1912
+ "epoch": 0.9947643979057592,
1913
+ "grad_norm": 6.685056209564209,
1914
+ "learning_rate": 1.854572609992586e-05,
1915
+ "loss": 0.4892,
1916
+ "step": 760
1917
+ },
1918
+ {
1919
+ "epoch": 0.9947643979057592,
1920
+ "eval_F1_err_corr": 0.9005018183708923,
1921
+ "eval_accuracy": 0.9076256767208043,
1922
+ "eval_correct_accuracy": 0.9694615035570632,
1923
+ "eval_error_accuracy": 0.8407011107412775,
1924
+ "eval_f1": 0.6246857717445953,
1925
+ "eval_loss": 0.24942660331726074,
1926
+ "eval_pr_auc": 0.6972885689682531,
1927
+ "eval_precision": 0.7021757558632382,
1928
+ "eval_recall": 0.5625990491283677,
1929
+ "eval_runtime": 24.7945,
1930
+ "eval_samples_per_second": 197.342,
1931
+ "eval_steps_per_second": 0.807,
1932
+ "step": 760
1933
+ }
1934
+ ],
1935
+ "logging_steps": 4,
1936
+ "max_steps": 3820,
1937
+ "num_input_tokens_seen": 0,
1938
+ "num_train_epochs": 5,
1939
+ "save_steps": 16,
1940
+ "stateful_callbacks": {
1941
+ "MinEpochEarlyStoppingCallback": {
1942
+ "args": {
1943
+ "early_stopping_patience": 5,
1944
+ "early_stopping_threshold": 0.001
1945
+ },
1946
+ "attributes": {
1947
+ "early_stopping_patience_counter": 0
1948
+ }
1949
+ },
1950
+ "TrainerControl": {
1951
+ "args": {
1952
+ "should_epoch_stop": false,
1953
+ "should_evaluate": false,
1954
+ "should_log": false,
1955
+ "should_save": true,
1956
+ "should_training_stop": false
1957
+ },
1958
+ "attributes": {}
1959
+ }
1960
+ },
1961
+ "total_flos": 3.37033143972266e+17,
1962
+ "train_batch_size": 64,
1963
+ "trial_name": null,
1964
+ "trial_params": null
1965
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa18b48ba38e0996a4f60e6d5cc265f97771d7a4b72103493580032594eb3cc8
3
+ size 6097
vocab.json ADDED
The diff for this file is too large to render. See raw diff