lewtun HF staff commited on
Commit
36e31c5
1 Parent(s): 52fd114

Add HuggingFaceH4/mistral-7b-ift-v12.0 checkpoint

Browse files
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-v0.1
4
+ tags:
5
+ - generated_from_trainer
6
+ model-index:
7
+ - name: mistral-7b-ift-v12.0
8
+ results: []
9
+ ---
10
+
11
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
12
+ should probably proofread and complete it, then remove this comment. -->
13
+
14
+ # mistral-7b-ift-v12.0
15
+
16
+ This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the HuggingFaceH4/ultrachat_uncensored_truecase_i_dont and the HuggingFaceH4/arithmo datasets.
17
+ It achieves the following results on the evaluation set:
18
+ - Loss: 0.9133
19
+
20
+ ## Model description
21
+
22
+ More information needed
23
+
24
+ ## Intended uses & limitations
25
+
26
+ More information needed
27
+
28
+ ## Training and evaluation data
29
+
30
+ More information needed
31
+
32
+ ## Training procedure
33
+
34
+ ### Training hyperparameters
35
+
36
+ The following hyperparameters were used during training:
37
+ - learning_rate: 2e-05
38
+ - train_batch_size: 8
39
+ - eval_batch_size: 16
40
+ - seed: 42
41
+ - distributed_type: multi-GPU
42
+ - num_devices: 16
43
+ - gradient_accumulation_steps: 4
44
+ - total_train_batch_size: 512
45
+ - total_eval_batch_size: 256
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: cosine
48
+ - lr_scheduler_warmup_ratio: 0.1
49
+ - num_epochs: 1
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss |
54
+ |:-------------:|:-----:|:----:|:---------------:|
55
+ | 0.8586 | 0.38 | 344 | 0.9133 |
56
+
57
+
58
+ ### Framework versions
59
+
60
+ - Transformers 4.35.0.dev0
61
+ - Pytorch 2.0.1+cu118
62
+ - Datasets 2.12.0
63
+ - Tokenizers 0.14.0
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "</s>": 2,
3
+ "<s>": 1,
4
+ "<unk>": 0
5
+ }
all_results.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.38,
3
+ "eval_loss": 0.9133002758026123,
4
+ "eval_runtime": 205.2179,
5
+ "eval_samples": 36829,
6
+ "eval_samples_per_second": 179.463,
7
+ "eval_steps_per_second": 0.702,
8
+ "perplexity": 2.4925350272208915,
9
+ "train_loss": 0.9639226873946745,
10
+ "train_runtime": 7750.2268,
11
+ "train_samples": 468526,
12
+ "train_samples_per_second": 60.453,
13
+ "train_steps_per_second": 0.118
14
+ }
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "mistralai/Mistral-7B-v0.1",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "bos_token_id": 1,
7
+ "eos_token_id": 2,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 4096,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 14336,
12
+ "max_position_embeddings": 32768,
13
+ "model_type": "mistral",
14
+ "num_attention_heads": 32,
15
+ "num_hidden_layers": 32,
16
+ "num_key_value_heads": 8,
17
+ "rms_norm_eps": 1e-05,
18
+ "rope_theta": 10000.0,
19
+ "sliding_window": 4096,
20
+ "tie_word_embeddings": false,
21
+ "torch_dtype": "bfloat16",
22
+ "transformers_version": "4.35.0.dev0",
23
+ "use_cache": true,
24
+ "vocab_size": 32000
25
+ }
dialogue_template.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "system": "",
3
+ "name": "default_v3",
4
+ "messages": [
5
+ {
6
+ "content": "Create a compelling crime story with a plot that revolves around a group of experienced thieves attempting to steal an antique car worth millions of dollars from a heavily guarded museum. Incorporate intricate heist planning, unexpected plot twists, and intriguing character development as the thieves race against time and face unexpected obstacles in their pursuit of the valuable vehicle. Additionally, explore themes related to greed, loyalty, and betrayal in the characters' relationships with each other and their motivations for committing the crime. Write in a suspenseful and tense style that keeps readers on the edge of their seats and builds to a satisfying climax.",
7
+ "role": "user"
8
+ },
9
+ {
10
+ "content": "It was a perfect plan.\n\nFour of the best thieves in the world had been planning for weeks to steal an antique car worth millions of dollars from a heavily guarded museum. The car was a one-of-a-kind, rare beauty that had never gone on public display before, and it was virtually priceless. The museum\u2019s security system was state-of-the-art, but they had a mole on the inside who was feeding them information about the layout of the building and the patrol schedules of the guards. They had even obtained a copy of the keycard to the room where the car was kept.\n\nThe four thieves \u2013 Jake, Alex, Mia, and Vince \u2013 were all veterans of hundreds of successful heists. They were cool, calm, and collected, and they had a reputation for being the best in the business. But this was different. This was the heist of a lifetime, and they knew that they had to be extra cautious.\n\nThey started by casing the museum for several days, watching the guards\u2019 patterns and looking for any vulnerabilities in the security system. They had a high-tech computer system that allowed them to simulate different scenarios and test their plan\u2019s viability. They even recruited a hacker to disable the museum\u2019s alarm system temporarily.\n\nThe night of the heist arrived, and they were ready. They wore black clothes and carried bags filled with tools and gadgets needed for the heist. They slipped into the museum, avoiding the motion sensors and cameras, and got to the room where the car was kept.\n\nThe plan was to disable the alarms and cameras, use the keycard to unlock the room's door, and remove the car before anyone realized what was happening.\n\nBut things did not go according to plan.\n\nAs soon as they started working on the door, they heard footsteps approaching. The mole had assured them that this wing of the museum was off-limits during the night, so they were not prepared for this. They quickly hid behind the exhibits as two guards walked right past them. They held their breath and waited for the guards to leave.\n\nThen, they heard the sound of the alarm on the hacker\u2019s computer. It was supposed to be a signal that the alarms were offline, but it had malfunctioned. The alarms were still on.\n\nPanic set in as the guards rushed into the room. They had been alerted by the hacker\u2019s mistake, and the thieves were trapped.\n\nBut the thieves were not ready to give up yet. They ran for the car, grabbing tools and breaking the glass case that housed it. The guards were in hot pursuit, and gunshots rang out. But they managed to get the car out of the museum, only to find that their getaway car was disabled.\n\nThey were trapped in the parking lot, surrounded by police cars and helicopters. Jake realized that the mole had double-crossed them and reported the plan to the police.\n\nIn the end, the thieves were arrested, and the car was returned to the museum. They all went to jail, but not before the thieves vowed revenge against the mole. The mole managed to escape, but not before Alex got a good look at his face.\n\nThe heist had been a failure, but the characters had grown and developed, discovering the extent of their own greed, loyalty, and betrayal. What had started as a simple heist had become an intense thrill ride, full of unexpected twists and turns. For better or for worse, everyone had gotten what they had wanted, and more.",
11
+ "role": "assistant"
12
+ },
13
+ {
14
+ "content": "That story is great! Can you add some more details about the characters' relationships with each other? I'm curious to know more about what drove them to work together and how they interacted throughout the heist.",
15
+ "role": "user"
16
+ },
17
+ {
18
+ "content": "Of course! Let me add some more details about the character relationships.\n\nJake was the leader of the group, and the one who had come up with the plan. He was an old hand at this sort of thing, but he had lost a lot of money after being double-crossed in a previous job. He was highly suspicious of everyone, including his own team members, and he was determined not to let another heist go wrong.\n\nAlex was Jake's right-hand man, and he had an almost obsessive respect for Jake. He had been working with Jake for years, and he saw him as a mentor and almost a father figure. At the same time, he had feelings for Mia, who was the newest addition to the team, and he struggled to keep them in check.\n\nMia was the wildcard of the group. She was young and impulsive, but very clever, and she had a talent for cracking safes and locks. She had joined the team because she admired Jake's reputation, but quickly found herself drawn to Alex. She tried not to let her feelings get in the way of the job, but it wasn't easy.\n\nVince was the muscle of the group, a tough guy with a long criminal record. He didn't talk much, but he was fiercely loyal to Jake and would do anything for him. He had been hired for the job specifically because of his physical strength.\n\nThroughout the heist, the relationships between the characters were tested. Jake's suspicions led him to become paranoid, and he kept a close eye on everyone. Alex's feelings for Mia caused him to make a few mistakes, and he ended up jeopardizing the mission. Mia, on the other hand, tried to stay focused on the job, but found herself getting more and more drawn to Alex. Vince was a silent but powerful presence, always ready to step in and defend his teammates.\n\nIn the end, the relationships between the characters were strained, but not broken. They all went to jail, but they had learned a lot about themselves and each other. They had discovered the extent of their own greed, loyalty, and betrayal, and they had seen firsthand the consequences of their actions. But they also knew that they could trust each other, even in the face of the toughest challenges.",
19
+ "role": "assistant"
20
+ }
21
+ ],
22
+ "system_token": "<|system|>",
23
+ "system_format": "standard",
24
+ "user_token": "<|user|>",
25
+ "assistant_token": "<|assistant|>",
26
+ "end_token": "</s>",
27
+ "mid_str": "\n",
28
+ "end_str": "\n",
29
+ "extra_end_text": "\n"
30
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.38,
3
+ "eval_loss": 0.9133002758026123,
4
+ "eval_runtime": 205.2179,
5
+ "eval_samples": 36829,
6
+ "eval_samples_per_second": 179.463,
7
+ "eval_steps_per_second": 0.702,
8
+ "perplexity": 2.4925350272208915
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.35.0.dev0"
6
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:010b7ada02bdb13054ddbed0c1d9fb0bdfdf0298ad2ec35908da7a15b7103c52
3
+ size 9942981696
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:655f2c00f67c44357015a9073d8530aad61d30322718a1b5e662466795457b10
3
+ size 4540516344
pytorch_model-00001-of-00002.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a0506a5a7f30fd03bfeb9e1bff27ccdc8e9294d2c96aecccce3c64153859e4fc
3
+ size 9943028044
pytorch_model-00002-of-00002.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8336cf10e28ce475cab17502433da7452ef4cd7fbccba21940eaea830f2dec84
3
+ size 4540535647
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 14483464192
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00002-of-00002.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00002.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
16
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
17
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
18
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
19
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
20
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
21
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
22
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
23
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
24
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
25
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
26
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
27
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
28
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
29
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
30
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
31
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
32
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
33
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
34
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
35
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
36
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
37
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
38
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
39
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
40
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
41
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
42
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
43
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
44
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
45
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
46
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
47
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
48
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
49
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
50
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
51
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
52
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
53
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
54
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
55
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
56
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
57
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
58
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
59
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
60
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
61
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
62
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
63
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
64
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
65
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
66
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
67
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
68
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
69
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
70
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
71
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
72
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
73
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
74
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
75
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
76
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
77
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
78
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
79
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
80
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
81
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
82
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
83
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
84
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
85
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
86
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
87
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
88
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
89
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
90
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
91
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
92
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
93
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
94
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
95
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
96
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
97
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
98
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
99
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
100
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
101
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
102
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
103
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
104
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
105
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
106
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
107
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
108
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
109
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
110
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
111
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
112
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
113
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
114
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
115
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
116
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
117
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
118
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
119
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
120
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
121
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
122
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
123
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
124
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
125
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
126
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
127
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
128
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
129
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
130
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
131
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
132
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
133
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
134
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
135
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
136
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
137
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
138
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
139
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
140
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
141
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
142
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
143
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
144
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
145
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
146
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
147
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
148
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
149
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
150
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
151
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
152
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
153
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
154
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
155
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
156
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
157
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
158
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
159
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
160
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
161
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
162
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
163
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
164
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
165
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
166
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
167
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
168
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
169
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
170
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
171
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
172
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
173
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
174
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
175
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
176
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
177
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
178
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
179
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
180
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
181
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
182
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
183
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
184
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
185
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
186
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
187
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
188
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
189
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
190
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
191
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
192
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
193
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
194
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
195
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
196
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
197
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
198
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
199
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
200
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
201
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
202
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
203
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
204
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
205
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
206
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
207
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
208
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
209
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
210
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
211
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
212
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
213
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
214
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
215
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
216
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
217
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
218
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
219
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
220
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
221
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
222
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
223
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
224
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
225
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
226
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
227
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
228
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
229
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
230
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
231
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
232
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
233
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00002-of-00002.bin",
234
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00002-of-00002.bin",
235
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00002-of-00002.bin",
236
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00002-of-00002.bin",
237
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00002-of-00002.bin",
238
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00002-of-00002.bin",
239
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00002-of-00002.bin",
240
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00002-of-00002.bin",
241
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00002-of-00002.bin",
242
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
243
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
244
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
245
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
246
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
247
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
248
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
249
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
250
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
251
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
252
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
253
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
254
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
255
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
256
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
257
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
258
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
259
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
260
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
261
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
262
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
263
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
264
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
265
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
266
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
267
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
268
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
269
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
270
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
271
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
272
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
273
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
274
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
275
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
276
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
277
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
278
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
279
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
280
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
281
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
282
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
283
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
284
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
285
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
286
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
287
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00001-of-00002.bin",
288
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00001-of-00002.bin",
289
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00001-of-00002.bin",
290
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00001-of-00002.bin",
291
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00001-of-00002.bin",
292
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00001-of-00002.bin",
293
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00001-of-00002.bin",
294
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00001-of-00002.bin",
295
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00001-of-00002.bin",
296
+ "model.norm.weight": "pytorch_model-00002-of-00002.bin"
297
+ }
298
+ }
runs/Oct20_13-49-22_ip-26-0-149-199/events.out.tfevents.1697810007.ip-26-0-149-199.2045305.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9fcfeb28f4a08a63ff05a5bbe7faf72fdebc5f9dc7640028ddf9857bebec866c
3
+ size 15846
runs/Oct20_13-49-22_ip-26-0-149-199/events.out.tfevents.1697817963.ip-26-0-149-199.2045305.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dec84c04136267ed08dc7bfbc73a6ee22c5268d8d5cc02c99d9c1bce885c915f
3
+ size 359
special_tokens_map.json ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<unk>",
4
+ "<s>",
5
+ "</s>"
6
+ ],
7
+ "bos_token": "<s>",
8
+ "eos_token": "</s>",
9
+ "pad_token": "</s>",
10
+ "unk_token": "<unk>"
11
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<unk>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<s>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ }
27
+ },
28
+ "additional_special_tokens": [
29
+ "<unk>",
30
+ "<s>",
31
+ "</s>"
32
+ ],
33
+ "bos_token": "<s>",
34
+ "clean_up_tokenization_spaces": false,
35
+ "eos_token": "</s>",
36
+ "legacy": true,
37
+ "model_max_length": 1000000000000000019884624838656,
38
+ "pad_token": "</s>",
39
+ "sp_model_kwargs": {},
40
+ "spaces_between_special_tokens": false,
41
+ "tokenizer_class": "LlamaTokenizer",
42
+ "unk_token": "<unk>",
43
+ "use_default_system_prompt": true
44
+ }
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 0.38,
3
+ "train_loss": 0.9639226873946745,
4
+ "train_runtime": 7750.2268,
5
+ "train_samples": 468526,
6
+ "train_samples_per_second": 60.453,
7
+ "train_steps_per_second": 0.118
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,450 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 0.3758535919147774,
5
+ "eval_steps": 500,
6
+ "global_step": 344,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.0,
13
+ "learning_rate": 2.173913043478261e-07,
14
+ "loss": 1.7313,
15
+ "step": 1
16
+ },
17
+ {
18
+ "epoch": 0.01,
19
+ "learning_rate": 1.0869565217391306e-06,
20
+ "loss": 1.6804,
21
+ "step": 5
22
+ },
23
+ {
24
+ "epoch": 0.01,
25
+ "learning_rate": 2.173913043478261e-06,
26
+ "loss": 1.3363,
27
+ "step": 10
28
+ },
29
+ {
30
+ "epoch": 0.02,
31
+ "learning_rate": 3.2608695652173914e-06,
32
+ "loss": 1.1728,
33
+ "step": 15
34
+ },
35
+ {
36
+ "epoch": 0.02,
37
+ "learning_rate": 4.347826086956522e-06,
38
+ "loss": 1.085,
39
+ "step": 20
40
+ },
41
+ {
42
+ "epoch": 0.03,
43
+ "learning_rate": 5.4347826086956525e-06,
44
+ "loss": 1.0442,
45
+ "step": 25
46
+ },
47
+ {
48
+ "epoch": 0.03,
49
+ "learning_rate": 6.521739130434783e-06,
50
+ "loss": 0.9988,
51
+ "step": 30
52
+ },
53
+ {
54
+ "epoch": 0.04,
55
+ "learning_rate": 7.608695652173914e-06,
56
+ "loss": 0.9694,
57
+ "step": 35
58
+ },
59
+ {
60
+ "epoch": 0.04,
61
+ "learning_rate": 8.695652173913044e-06,
62
+ "loss": 0.9524,
63
+ "step": 40
64
+ },
65
+ {
66
+ "epoch": 0.05,
67
+ "learning_rate": 9.782608695652175e-06,
68
+ "loss": 0.9502,
69
+ "step": 45
70
+ },
71
+ {
72
+ "epoch": 0.05,
73
+ "learning_rate": 1.0869565217391305e-05,
74
+ "loss": 0.9313,
75
+ "step": 50
76
+ },
77
+ {
78
+ "epoch": 0.06,
79
+ "learning_rate": 1.1956521739130435e-05,
80
+ "loss": 0.9182,
81
+ "step": 55
82
+ },
83
+ {
84
+ "epoch": 0.07,
85
+ "learning_rate": 1.3043478260869566e-05,
86
+ "loss": 0.9174,
87
+ "step": 60
88
+ },
89
+ {
90
+ "epoch": 0.07,
91
+ "learning_rate": 1.4130434782608698e-05,
92
+ "loss": 0.9027,
93
+ "step": 65
94
+ },
95
+ {
96
+ "epoch": 0.08,
97
+ "learning_rate": 1.5217391304347828e-05,
98
+ "loss": 0.9057,
99
+ "step": 70
100
+ },
101
+ {
102
+ "epoch": 0.08,
103
+ "learning_rate": 1.630434782608696e-05,
104
+ "loss": 0.9031,
105
+ "step": 75
106
+ },
107
+ {
108
+ "epoch": 0.09,
109
+ "learning_rate": 1.739130434782609e-05,
110
+ "loss": 0.9037,
111
+ "step": 80
112
+ },
113
+ {
114
+ "epoch": 0.09,
115
+ "learning_rate": 1.847826086956522e-05,
116
+ "loss": 0.8971,
117
+ "step": 85
118
+ },
119
+ {
120
+ "epoch": 0.1,
121
+ "learning_rate": 1.956521739130435e-05,
122
+ "loss": 0.8918,
123
+ "step": 90
124
+ },
125
+ {
126
+ "epoch": 0.1,
127
+ "learning_rate": 1.999934429598561e-05,
128
+ "loss": 0.9005,
129
+ "step": 95
130
+ },
131
+ {
132
+ "epoch": 0.11,
133
+ "learning_rate": 1.9995337527295925e-05,
134
+ "loss": 0.882,
135
+ "step": 100
136
+ },
137
+ {
138
+ "epoch": 0.11,
139
+ "learning_rate": 1.9987689727712563e-05,
140
+ "loss": 0.8806,
141
+ "step": 105
142
+ },
143
+ {
144
+ "epoch": 0.12,
145
+ "learning_rate": 1.997640368312189e-05,
146
+ "loss": 0.8911,
147
+ "step": 110
148
+ },
149
+ {
150
+ "epoch": 0.13,
151
+ "learning_rate": 1.9961483504724445e-05,
152
+ "loss": 0.8899,
153
+ "step": 115
154
+ },
155
+ {
156
+ "epoch": 0.13,
157
+ "learning_rate": 1.9942934627537337e-05,
158
+ "loss": 0.8818,
159
+ "step": 120
160
+ },
161
+ {
162
+ "epoch": 0.14,
163
+ "learning_rate": 1.992076380841442e-05,
164
+ "loss": 1.566,
165
+ "step": 125
166
+ },
167
+ {
168
+ "epoch": 0.14,
169
+ "learning_rate": 1.989497912358495e-05,
170
+ "loss": 1.7017,
171
+ "step": 130
172
+ },
173
+ {
174
+ "epoch": 0.15,
175
+ "learning_rate": 1.9865589965711636e-05,
176
+ "loss": 1.2844,
177
+ "step": 135
178
+ },
179
+ {
180
+ "epoch": 0.15,
181
+ "learning_rate": 1.9832607040469147e-05,
182
+ "loss": 1.1296,
183
+ "step": 140
184
+ },
185
+ {
186
+ "epoch": 0.16,
187
+ "learning_rate": 1.9796042362644315e-05,
188
+ "loss": 1.0674,
189
+ "step": 145
190
+ },
191
+ {
192
+ "epoch": 0.16,
193
+ "learning_rate": 1.9755909251759493e-05,
194
+ "loss": 1.0187,
195
+ "step": 150
196
+ },
197
+ {
198
+ "epoch": 0.17,
199
+ "learning_rate": 1.97122223272206e-05,
200
+ "loss": 0.9856,
201
+ "step": 155
202
+ },
203
+ {
204
+ "epoch": 0.17,
205
+ "learning_rate": 1.9664997502991665e-05,
206
+ "loss": 0.9741,
207
+ "step": 160
208
+ },
209
+ {
210
+ "epoch": 0.18,
211
+ "learning_rate": 1.961425198179781e-05,
212
+ "loss": 0.9602,
213
+ "step": 165
214
+ },
215
+ {
216
+ "epoch": 0.19,
217
+ "learning_rate": 1.9560004248858754e-05,
218
+ "loss": 0.9388,
219
+ "step": 170
220
+ },
221
+ {
222
+ "epoch": 0.19,
223
+ "learning_rate": 1.950227406515516e-05,
224
+ "loss": 0.9239,
225
+ "step": 175
226
+ },
227
+ {
228
+ "epoch": 0.2,
229
+ "learning_rate": 1.9441082460230226e-05,
230
+ "loss": 0.9376,
231
+ "step": 180
232
+ },
233
+ {
234
+ "epoch": 0.2,
235
+ "learning_rate": 1.9376451724529207e-05,
236
+ "loss": 0.9257,
237
+ "step": 185
238
+ },
239
+ {
240
+ "epoch": 0.21,
241
+ "learning_rate": 1.930840540127961e-05,
242
+ "loss": 0.927,
243
+ "step": 190
244
+ },
245
+ {
246
+ "epoch": 0.21,
247
+ "learning_rate": 1.923696827791502e-05,
248
+ "loss": 0.9064,
249
+ "step": 195
250
+ },
251
+ {
252
+ "epoch": 0.22,
253
+ "learning_rate": 1.9162166377045723e-05,
254
+ "loss": 0.9088,
255
+ "step": 200
256
+ },
257
+ {
258
+ "epoch": 0.22,
259
+ "learning_rate": 1.9084026946979366e-05,
260
+ "loss": 0.9014,
261
+ "step": 205
262
+ },
263
+ {
264
+ "epoch": 0.23,
265
+ "learning_rate": 1.9002578451795133e-05,
266
+ "loss": 0.9092,
267
+ "step": 210
268
+ },
269
+ {
270
+ "epoch": 0.23,
271
+ "learning_rate": 1.8917850560975064e-05,
272
+ "loss": 0.9081,
273
+ "step": 215
274
+ },
275
+ {
276
+ "epoch": 0.24,
277
+ "learning_rate": 1.882987413859625e-05,
278
+ "loss": 0.9027,
279
+ "step": 220
280
+ },
281
+ {
282
+ "epoch": 0.25,
283
+ "learning_rate": 1.8738681232087897e-05,
284
+ "loss": 0.8962,
285
+ "step": 225
286
+ },
287
+ {
288
+ "epoch": 0.25,
289
+ "learning_rate": 1.8644305060557317e-05,
290
+ "loss": 0.8823,
291
+ "step": 230
292
+ },
293
+ {
294
+ "epoch": 0.26,
295
+ "learning_rate": 1.8546780002689088e-05,
296
+ "loss": 0.892,
297
+ "step": 235
298
+ },
299
+ {
300
+ "epoch": 0.26,
301
+ "learning_rate": 1.8446141584221854e-05,
302
+ "loss": 0.8887,
303
+ "step": 240
304
+ },
305
+ {
306
+ "epoch": 0.27,
307
+ "learning_rate": 1.834242646500724e-05,
308
+ "loss": 0.8843,
309
+ "step": 245
310
+ },
311
+ {
312
+ "epoch": 0.27,
313
+ "learning_rate": 1.8235672425655678e-05,
314
+ "loss": 0.9014,
315
+ "step": 250
316
+ },
317
+ {
318
+ "epoch": 0.28,
319
+ "learning_rate": 1.8125918353773934e-05,
320
+ "loss": 0.8911,
321
+ "step": 255
322
+ },
323
+ {
324
+ "epoch": 0.28,
325
+ "learning_rate": 1.8013204229799422e-05,
326
+ "loss": 0.881,
327
+ "step": 260
328
+ },
329
+ {
330
+ "epoch": 0.29,
331
+ "learning_rate": 1.7897571112436404e-05,
332
+ "loss": 0.8879,
333
+ "step": 265
334
+ },
335
+ {
336
+ "epoch": 0.3,
337
+ "learning_rate": 1.777906112369942e-05,
338
+ "loss": 0.8922,
339
+ "step": 270
340
+ },
341
+ {
342
+ "epoch": 0.3,
343
+ "learning_rate": 1.7657717433569384e-05,
344
+ "loss": 0.8747,
345
+ "step": 275
346
+ },
347
+ {
348
+ "epoch": 0.31,
349
+ "learning_rate": 1.7533584244267897e-05,
350
+ "loss": 0.8715,
351
+ "step": 280
352
+ },
353
+ {
354
+ "epoch": 0.31,
355
+ "learning_rate": 1.7406706774155625e-05,
356
+ "loss": 0.8812,
357
+ "step": 285
358
+ },
359
+ {
360
+ "epoch": 0.32,
361
+ "learning_rate": 1.7277131241260438e-05,
362
+ "loss": 0.8669,
363
+ "step": 290
364
+ },
365
+ {
366
+ "epoch": 0.32,
367
+ "learning_rate": 1.7144904846441434e-05,
368
+ "loss": 0.864,
369
+ "step": 295
370
+ },
371
+ {
372
+ "epoch": 0.33,
373
+ "learning_rate": 1.7010075756194962e-05,
374
+ "loss": 0.8683,
375
+ "step": 300
376
+ },
377
+ {
378
+ "epoch": 0.33,
379
+ "learning_rate": 1.6872693085108864e-05,
380
+ "loss": 0.867,
381
+ "step": 305
382
+ },
383
+ {
384
+ "epoch": 0.34,
385
+ "learning_rate": 1.673280687797135e-05,
386
+ "loss": 0.8724,
387
+ "step": 310
388
+ },
389
+ {
390
+ "epoch": 0.34,
391
+ "learning_rate": 1.659046809154105e-05,
392
+ "loss": 0.8802,
393
+ "step": 315
394
+ },
395
+ {
396
+ "epoch": 0.35,
397
+ "learning_rate": 1.6445728575984838e-05,
398
+ "loss": 0.8657,
399
+ "step": 320
400
+ },
401
+ {
402
+ "epoch": 0.36,
403
+ "learning_rate": 1.6298641055990222e-05,
404
+ "loss": 0.8602,
405
+ "step": 325
406
+ },
407
+ {
408
+ "epoch": 0.36,
409
+ "learning_rate": 1.614925911155917e-05,
410
+ "loss": 0.8641,
411
+ "step": 330
412
+ },
413
+ {
414
+ "epoch": 0.37,
415
+ "learning_rate": 1.5997637158490366e-05,
416
+ "loss": 0.8606,
417
+ "step": 335
418
+ },
419
+ {
420
+ "epoch": 0.37,
421
+ "learning_rate": 1.5843830428557e-05,
422
+ "loss": 0.8586,
423
+ "step": 340
424
+ },
425
+ {
426
+ "epoch": 0.38,
427
+ "eval_loss": 0.9133486151695251,
428
+ "eval_runtime": 211.9502,
429
+ "eval_samples_per_second": 173.763,
430
+ "eval_steps_per_second": 0.679,
431
+ "step": 344
432
+ },
433
+ {
434
+ "epoch": 0.38,
435
+ "step": 344,
436
+ "total_flos": 576369847173120.0,
437
+ "train_loss": 0.9639226873946745,
438
+ "train_runtime": 7750.2268,
439
+ "train_samples_per_second": 60.453,
440
+ "train_steps_per_second": 0.118
441
+ }
442
+ ],
443
+ "logging_steps": 5,
444
+ "max_steps": 915,
445
+ "num_train_epochs": 1,
446
+ "save_steps": 500,
447
+ "total_flos": 576369847173120.0,
448
+ "trial_name": null,
449
+ "trial_params": null
450
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e8b7815ffad6613670bdf92182f49d712b20757f78628c5361f042f2c1cc561
3
+ size 5435