Jerry999 commited on
Commit
a4e7793
·
verified ·
1 Parent(s): 852afb7

Clear v1 outputs (60 files) before uploading v2

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +0 -58
  2. chat_template.jinja +0 -61
  3. checkpoint-1000/chat_template.jinja +0 -61
  4. checkpoint-1000/config.json +0 -71
  5. checkpoint-1000/generation_config.json +0 -12
  6. checkpoint-1000/model.safetensors +0 -3
  7. checkpoint-1000/optimizer.bin +0 -3
  8. checkpoint-1000/pytorch_model_fsdp.bin +0 -3
  9. checkpoint-1000/rng_state_0.pth +0 -3
  10. checkpoint-1000/rng_state_1.pth +0 -3
  11. checkpoint-1000/scheduler.pt +0 -3
  12. checkpoint-1000/tokenizer.json +0 -3
  13. checkpoint-1000/tokenizer_config.json +0 -29
  14. checkpoint-1000/trainer_state.json +0 -0
  15. checkpoint-1000/training_args.bin +0 -3
  16. checkpoint-2000/chat_template.jinja +0 -61
  17. checkpoint-2000/config.json +0 -71
  18. checkpoint-2000/generation_config.json +0 -12
  19. checkpoint-2000/model.safetensors +0 -3
  20. checkpoint-2000/optimizer.bin +0 -3
  21. checkpoint-2000/pytorch_model_fsdp.bin +0 -3
  22. checkpoint-2000/rng_state_0.pth +0 -3
  23. checkpoint-2000/rng_state_1.pth +0 -3
  24. checkpoint-2000/scheduler.pt +0 -3
  25. checkpoint-2000/tokenizer.json +0 -3
  26. checkpoint-2000/tokenizer_config.json +0 -29
  27. checkpoint-2000/trainer_state.json +0 -0
  28. checkpoint-2000/training_args.bin +0 -3
  29. checkpoint-3000/chat_template.jinja +0 -61
  30. checkpoint-3000/config.json +0 -71
  31. checkpoint-3000/generation_config.json +0 -12
  32. checkpoint-3000/model.safetensors +0 -3
  33. checkpoint-3000/optimizer.bin +0 -3
  34. checkpoint-3000/pytorch_model_fsdp.bin +0 -3
  35. checkpoint-3000/rng_state_0.pth +0 -3
  36. checkpoint-3000/rng_state_1.pth +0 -3
  37. checkpoint-3000/scheduler.pt +0 -3
  38. checkpoint-3000/tokenizer.json +0 -3
  39. checkpoint-3000/tokenizer_config.json +0 -29
  40. checkpoint-3000/trainer_state.json +0 -0
  41. checkpoint-3000/training_args.bin +0 -3
  42. checkpoint-3948/chat_template.jinja +0 -61
  43. checkpoint-3948/config.json +0 -71
  44. checkpoint-3948/generation_config.json +0 -12
  45. checkpoint-3948/model.safetensors +0 -3
  46. checkpoint-3948/optimizer.bin +0 -3
  47. checkpoint-3948/pytorch_model_fsdp.bin +0 -3
  48. checkpoint-3948/rng_state_0.pth +0 -3
  49. checkpoint-3948/rng_state_1.pth +0 -3
  50. checkpoint-3948/scheduler.pt +0 -3
README.md DELETED
@@ -1,58 +0,0 @@
1
- ---
2
- base_model: Qwen/Qwen3-4B-Instruct-2507
3
- library_name: transformers
4
- model_name: Qwen3-8B_n3000_math
5
- tags:
6
- - generated_from_trainer
7
- - sft
8
- - trl
9
- licence: license
10
- ---
11
-
12
- # Model Card for Qwen3-8B_n3000_math
13
-
14
- This model is a fine-tuned version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507).
15
- It has been trained using [TRL](https://github.com/huggingface/trl).
16
-
17
- ## Quick start
18
-
19
- ```python
20
- from transformers import pipeline
21
-
22
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
- generator = pipeline("text-generation", model="None", device="cuda")
24
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
- print(output["generated_text"])
26
- ```
27
-
28
- ## Training procedure
29
-
30
-
31
-
32
-
33
-
34
- This model was trained with SFT.
35
-
36
- ### Framework versions
37
-
38
- - TRL: 0.29.0
39
- - Transformers: 5.5.3
40
- - Pytorch: 2.8.0
41
- - Datasets: 4.5.0
42
- - Tokenizers: 0.22.2
43
-
44
- ## Citations
45
-
46
-
47
-
48
- Cite TRL as:
49
-
50
- ```bibtex
51
- @software{vonwerra2020trl,
52
- title = {{TRL: Transformers Reinforcement Learning}},
53
- author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
54
- license = {Apache-2.0},
55
- url = {https://github.com/huggingface/trl},
56
- year = {2020}
57
- }
58
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
chat_template.jinja DELETED
@@ -1,61 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0].role == 'system' %}
4
- {{- messages[0].content + '\n\n' }}
5
- {%- endif %}
6
- {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
- {%- for tool in tools %}
8
- {{- "\n" }}
9
- {{- tool | tojson }}
10
- {%- endfor %}
11
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
- {%- else %}
13
- {%- if messages[0].role == 'system' %}
14
- {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
- {%- endif %}
16
- {%- endif %}
17
- {%- for message in messages %}
18
- {%- if message.content is string %}
19
- {%- set content = message.content %}
20
- {%- else %}
21
- {%- set content = '' %}
22
- {%- endif %}
23
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
24
- {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
25
- {%- elif message.role == "assistant" %}
26
- {{- '<|im_start|>' + message.role + '\n' + content }}
27
- {%- if message.tool_calls %}
28
- {%- for tool_call in message.tool_calls %}
29
- {%- if (loop.first and content) or (not loop.first) %}
30
- {{- '\n' }}
31
- {%- endif %}
32
- {%- if tool_call.function %}
33
- {%- set tool_call = tool_call.function %}
34
- {%- endif %}
35
- {{- '<tool_call>\n{"name": "' }}
36
- {{- tool_call.name }}
37
- {{- '", "arguments": ' }}
38
- {%- if tool_call.arguments is string %}
39
- {{- tool_call.arguments }}
40
- {%- else %}
41
- {{- tool_call.arguments | tojson }}
42
- {%- endif %}
43
- {{- '}\n</tool_call>' }}
44
- {%- endfor %}
45
- {%- endif %}
46
- {{- '<|im_end|>\n' }}
47
- {%- elif message.role == "tool" %}
48
- {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
49
- {{- '<|im_start|>user' }}
50
- {%- endif %}
51
- {{- '\n<tool_response>\n' }}
52
- {{- content }}
53
- {{- '\n</tool_response>' }}
54
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
55
- {{- '<|im_end|>\n' }}
56
- {%- endif %}
57
- {%- endif %}
58
- {%- endfor %}
59
- {%- if add_generation_prompt %}
60
- {{- '<|im_start|>assistant\n' }}
61
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1000/chat_template.jinja DELETED
@@ -1,61 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0].role == 'system' %}
4
- {{- messages[0].content + '\n\n' }}
5
- {%- endif %}
6
- {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
- {%- for tool in tools %}
8
- {{- "\n" }}
9
- {{- tool | tojson }}
10
- {%- endfor %}
11
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
- {%- else %}
13
- {%- if messages[0].role == 'system' %}
14
- {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
- {%- endif %}
16
- {%- endif %}
17
- {%- for message in messages %}
18
- {%- if message.content is string %}
19
- {%- set content = message.content %}
20
- {%- else %}
21
- {%- set content = '' %}
22
- {%- endif %}
23
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
24
- {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
25
- {%- elif message.role == "assistant" %}
26
- {{- '<|im_start|>' + message.role + '\n' + content }}
27
- {%- if message.tool_calls %}
28
- {%- for tool_call in message.tool_calls %}
29
- {%- if (loop.first and content) or (not loop.first) %}
30
- {{- '\n' }}
31
- {%- endif %}
32
- {%- if tool_call.function %}
33
- {%- set tool_call = tool_call.function %}
34
- {%- endif %}
35
- {{- '<tool_call>\n{"name": "' }}
36
- {{- tool_call.name }}
37
- {{- '", "arguments": ' }}
38
- {%- if tool_call.arguments is string %}
39
- {{- tool_call.arguments }}
40
- {%- else %}
41
- {{- tool_call.arguments | tojson }}
42
- {%- endif %}
43
- {{- '}\n</tool_call>' }}
44
- {%- endfor %}
45
- {%- endif %}
46
- {{- '<|im_end|>\n' }}
47
- {%- elif message.role == "tool" %}
48
- {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
49
- {{- '<|im_start|>user' }}
50
- {%- endif %}
51
- {{- '\n<tool_response>\n' }}
52
- {{- content }}
53
- {{- '\n</tool_response>' }}
54
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
55
- {{- '<|im_end|>\n' }}
56
- {%- endif %}
57
- {%- endif %}
58
- {%- endfor %}
59
- {%- if add_generation_prompt %}
60
- {{- '<|im_start|>assistant\n' }}
61
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1000/config.json DELETED
@@ -1,71 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen3ForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": null,
8
- "dtype": "float32",
9
- "eos_token_id": 151645,
10
- "head_dim": 128,
11
- "hidden_act": "silu",
12
- "hidden_size": 2560,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 9728,
15
- "layer_types": [
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention",
41
- "full_attention",
42
- "full_attention",
43
- "full_attention",
44
- "full_attention",
45
- "full_attention",
46
- "full_attention",
47
- "full_attention",
48
- "full_attention",
49
- "full_attention",
50
- "full_attention",
51
- "full_attention"
52
- ],
53
- "max_position_embeddings": 262144,
54
- "max_window_layers": 36,
55
- "model_type": "qwen3",
56
- "num_attention_heads": 32,
57
- "num_hidden_layers": 36,
58
- "num_key_value_heads": 8,
59
- "pad_token_id": 151662,
60
- "rms_norm_eps": 1e-06,
61
- "rope_parameters": {
62
- "rope_theta": 5000000,
63
- "rope_type": "default"
64
- },
65
- "sliding_window": null,
66
- "tie_word_embeddings": true,
67
- "transformers_version": "5.5.3",
68
- "use_cache": false,
69
- "use_sliding_window": false,
70
- "vocab_size": 151936
71
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1000/generation_config.json DELETED
@@ -1,12 +0,0 @@
1
- {
2
- "do_sample": true,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151662,
8
- "temperature": 0.7,
9
- "top_k": 20,
10
- "top_p": 0.8,
11
- "transformers_version": "5.5.3"
12
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1000/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:16cf530a69292d5ebcdc898ff6e27f40e9fa97d07ec9a6fff92606a1cbec50f4
3
- size 17645743048
 
 
 
 
checkpoint-1000/optimizer.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:ad09a9b1f9d56fb5e24fccb31bc61995bcb8aa26d3d4e5771bcd332a90d2d66e
3
- size 32180124005
 
 
 
 
checkpoint-1000/pytorch_model_fsdp.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:cde7e1f8a53dcc9407e8636dd3c4261b755f26602abf7c70e6eb4291c93496bd
3
- size 17645897996
 
 
 
 
checkpoint-1000/rng_state_0.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:4dd7671ce88d469c49c0530724ac76b2306574002d1ecd1ca9294e41621fd96a
3
- size 14917
 
 
 
 
checkpoint-1000/rng_state_1.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:3246ef1170ccca541a03b89ad6f20e01c51eb6834a2c2211c78c71c70f896879
3
- size 14917
 
 
 
 
checkpoint-1000/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:3e3184dc815b4354af3c63c9b5b618608d5206305b4414657ef8e0195f7ad089
3
- size 1465
 
 
 
 
checkpoint-1000/tokenizer.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
3
- size 11422650
 
 
 
 
checkpoint-1000/tokenizer_config.json DELETED
@@ -1,29 +0,0 @@
1
- {
2
- "add_prefix_space": false,
3
- "backend": "tokenizers",
4
- "bos_token": null,
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "<|im_end|>",
7
- "errors": "replace",
8
- "extra_special_tokens": [
9
- "<|im_start|>",
10
- "<|im_end|>",
11
- "<|object_ref_start|>",
12
- "<|object_ref_end|>",
13
- "<|box_start|>",
14
- "<|box_end|>",
15
- "<|quad_start|>",
16
- "<|quad_end|>",
17
- "<|vision_start|>",
18
- "<|vision_end|>",
19
- "<|vision_pad|>",
20
- "<|image_pad|>",
21
- "<|video_pad|>"
22
- ],
23
- "is_local": false,
24
- "model_max_length": 1010000,
25
- "pad_token": "<|fim_pad|>",
26
- "split_special_tokens": false,
27
- "tokenizer_class": "Qwen2Tokenizer",
28
- "unk_token": null
29
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-1000/trainer_state.json DELETED
The diff for this file is too large to render. See raw diff
 
checkpoint-1000/training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:bb9e429a6dba8782c1beb1411b31fa91f0c01ec6e0b1441e21d679f8a8b2c021
3
- size 6225
 
 
 
 
checkpoint-2000/chat_template.jinja DELETED
@@ -1,61 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0].role == 'system' %}
4
- {{- messages[0].content + '\n\n' }}
5
- {%- endif %}
6
- {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
- {%- for tool in tools %}
8
- {{- "\n" }}
9
- {{- tool | tojson }}
10
- {%- endfor %}
11
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
- {%- else %}
13
- {%- if messages[0].role == 'system' %}
14
- {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
- {%- endif %}
16
- {%- endif %}
17
- {%- for message in messages %}
18
- {%- if message.content is string %}
19
- {%- set content = message.content %}
20
- {%- else %}
21
- {%- set content = '' %}
22
- {%- endif %}
23
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
24
- {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
25
- {%- elif message.role == "assistant" %}
26
- {{- '<|im_start|>' + message.role + '\n' + content }}
27
- {%- if message.tool_calls %}
28
- {%- for tool_call in message.tool_calls %}
29
- {%- if (loop.first and content) or (not loop.first) %}
30
- {{- '\n' }}
31
- {%- endif %}
32
- {%- if tool_call.function %}
33
- {%- set tool_call = tool_call.function %}
34
- {%- endif %}
35
- {{- '<tool_call>\n{"name": "' }}
36
- {{- tool_call.name }}
37
- {{- '", "arguments": ' }}
38
- {%- if tool_call.arguments is string %}
39
- {{- tool_call.arguments }}
40
- {%- else %}
41
- {{- tool_call.arguments | tojson }}
42
- {%- endif %}
43
- {{- '}\n</tool_call>' }}
44
- {%- endfor %}
45
- {%- endif %}
46
- {{- '<|im_end|>\n' }}
47
- {%- elif message.role == "tool" %}
48
- {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
49
- {{- '<|im_start|>user' }}
50
- {%- endif %}
51
- {{- '\n<tool_response>\n' }}
52
- {{- content }}
53
- {{- '\n</tool_response>' }}
54
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
55
- {{- '<|im_end|>\n' }}
56
- {%- endif %}
57
- {%- endif %}
58
- {%- endfor %}
59
- {%- if add_generation_prompt %}
60
- {{- '<|im_start|>assistant\n' }}
61
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-2000/config.json DELETED
@@ -1,71 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen3ForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": null,
8
- "dtype": "float32",
9
- "eos_token_id": 151645,
10
- "head_dim": 128,
11
- "hidden_act": "silu",
12
- "hidden_size": 2560,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 9728,
15
- "layer_types": [
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention",
41
- "full_attention",
42
- "full_attention",
43
- "full_attention",
44
- "full_attention",
45
- "full_attention",
46
- "full_attention",
47
- "full_attention",
48
- "full_attention",
49
- "full_attention",
50
- "full_attention",
51
- "full_attention"
52
- ],
53
- "max_position_embeddings": 262144,
54
- "max_window_layers": 36,
55
- "model_type": "qwen3",
56
- "num_attention_heads": 32,
57
- "num_hidden_layers": 36,
58
- "num_key_value_heads": 8,
59
- "pad_token_id": 151662,
60
- "rms_norm_eps": 1e-06,
61
- "rope_parameters": {
62
- "rope_theta": 5000000,
63
- "rope_type": "default"
64
- },
65
- "sliding_window": null,
66
- "tie_word_embeddings": true,
67
- "transformers_version": "5.5.3",
68
- "use_cache": false,
69
- "use_sliding_window": false,
70
- "vocab_size": 151936
71
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-2000/generation_config.json DELETED
@@ -1,12 +0,0 @@
1
- {
2
- "do_sample": true,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151662,
8
- "temperature": 0.7,
9
- "top_k": 20,
10
- "top_p": 0.8,
11
- "transformers_version": "5.5.3"
12
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-2000/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1b1ce241be74f81ade1793d7d1184e1cf7ce2e9afe46f5dd9418012bd1861b43
3
- size 17645743048
 
 
 
 
checkpoint-2000/optimizer.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:07e07657f743306d7736d8218c799dfc731283d7dedfca7eb48d4dcc64c64623
3
- size 32180124005
 
 
 
 
checkpoint-2000/pytorch_model_fsdp.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:27df8f98b77baf9afbd9bdac0a9ff6cc9e53f4d44310a5d8c665d45656911b2e
3
- size 17645897996
 
 
 
 
checkpoint-2000/rng_state_0.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:95e5fc2074c0df31522a514f862c86cb00d71c946a7f15cc9ec0e53a69fb28a7
3
- size 14917
 
 
 
 
checkpoint-2000/rng_state_1.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0e7153eae67b6c9232a41bc996a2bf5b83229b8c7230d61911ac0fd40e64154e
3
- size 14917
 
 
 
 
checkpoint-2000/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:7c70c34042f727a1ef06eb662d77f90fe87f01cf21415dce97c8cb4c779b5625
3
- size 1465
 
 
 
 
checkpoint-2000/tokenizer.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
3
- size 11422650
 
 
 
 
checkpoint-2000/tokenizer_config.json DELETED
@@ -1,29 +0,0 @@
1
- {
2
- "add_prefix_space": false,
3
- "backend": "tokenizers",
4
- "bos_token": null,
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "<|im_end|>",
7
- "errors": "replace",
8
- "extra_special_tokens": [
9
- "<|im_start|>",
10
- "<|im_end|>",
11
- "<|object_ref_start|>",
12
- "<|object_ref_end|>",
13
- "<|box_start|>",
14
- "<|box_end|>",
15
- "<|quad_start|>",
16
- "<|quad_end|>",
17
- "<|vision_start|>",
18
- "<|vision_end|>",
19
- "<|vision_pad|>",
20
- "<|image_pad|>",
21
- "<|video_pad|>"
22
- ],
23
- "is_local": false,
24
- "model_max_length": 1010000,
25
- "pad_token": "<|fim_pad|>",
26
- "split_special_tokens": false,
27
- "tokenizer_class": "Qwen2Tokenizer",
28
- "unk_token": null
29
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-2000/trainer_state.json DELETED
The diff for this file is too large to render. See raw diff
 
checkpoint-2000/training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:bb9e429a6dba8782c1beb1411b31fa91f0c01ec6e0b1441e21d679f8a8b2c021
3
- size 6225
 
 
 
 
checkpoint-3000/chat_template.jinja DELETED
@@ -1,61 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0].role == 'system' %}
4
- {{- messages[0].content + '\n\n' }}
5
- {%- endif %}
6
- {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
- {%- for tool in tools %}
8
- {{- "\n" }}
9
- {{- tool | tojson }}
10
- {%- endfor %}
11
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
- {%- else %}
13
- {%- if messages[0].role == 'system' %}
14
- {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
- {%- endif %}
16
- {%- endif %}
17
- {%- for message in messages %}
18
- {%- if message.content is string %}
19
- {%- set content = message.content %}
20
- {%- else %}
21
- {%- set content = '' %}
22
- {%- endif %}
23
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
24
- {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
25
- {%- elif message.role == "assistant" %}
26
- {{- '<|im_start|>' + message.role + '\n' + content }}
27
- {%- if message.tool_calls %}
28
- {%- for tool_call in message.tool_calls %}
29
- {%- if (loop.first and content) or (not loop.first) %}
30
- {{- '\n' }}
31
- {%- endif %}
32
- {%- if tool_call.function %}
33
- {%- set tool_call = tool_call.function %}
34
- {%- endif %}
35
- {{- '<tool_call>\n{"name": "' }}
36
- {{- tool_call.name }}
37
- {{- '", "arguments": ' }}
38
- {%- if tool_call.arguments is string %}
39
- {{- tool_call.arguments }}
40
- {%- else %}
41
- {{- tool_call.arguments | tojson }}
42
- {%- endif %}
43
- {{- '}\n</tool_call>' }}
44
- {%- endfor %}
45
- {%- endif %}
46
- {{- '<|im_end|>\n' }}
47
- {%- elif message.role == "tool" %}
48
- {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
49
- {{- '<|im_start|>user' }}
50
- {%- endif %}
51
- {{- '\n<tool_response>\n' }}
52
- {{- content }}
53
- {{- '\n</tool_response>' }}
54
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
55
- {{- '<|im_end|>\n' }}
56
- {%- endif %}
57
- {%- endif %}
58
- {%- endfor %}
59
- {%- if add_generation_prompt %}
60
- {{- '<|im_start|>assistant\n' }}
61
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-3000/config.json DELETED
@@ -1,71 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen3ForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": null,
8
- "dtype": "float32",
9
- "eos_token_id": 151645,
10
- "head_dim": 128,
11
- "hidden_act": "silu",
12
- "hidden_size": 2560,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 9728,
15
- "layer_types": [
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention",
41
- "full_attention",
42
- "full_attention",
43
- "full_attention",
44
- "full_attention",
45
- "full_attention",
46
- "full_attention",
47
- "full_attention",
48
- "full_attention",
49
- "full_attention",
50
- "full_attention",
51
- "full_attention"
52
- ],
53
- "max_position_embeddings": 262144,
54
- "max_window_layers": 36,
55
- "model_type": "qwen3",
56
- "num_attention_heads": 32,
57
- "num_hidden_layers": 36,
58
- "num_key_value_heads": 8,
59
- "pad_token_id": 151662,
60
- "rms_norm_eps": 1e-06,
61
- "rope_parameters": {
62
- "rope_theta": 5000000,
63
- "rope_type": "default"
64
- },
65
- "sliding_window": null,
66
- "tie_word_embeddings": true,
67
- "transformers_version": "5.5.3",
68
- "use_cache": false,
69
- "use_sliding_window": false,
70
- "vocab_size": 151936
71
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-3000/generation_config.json DELETED
@@ -1,12 +0,0 @@
1
- {
2
- "do_sample": true,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151662,
8
- "temperature": 0.7,
9
- "top_k": 20,
10
- "top_p": 0.8,
11
- "transformers_version": "5.5.3"
12
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-3000/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0a87a133eb5ec5af0878395bc45e179834b11224819f981211f70acdd015060b
3
- size 17645743048
 
 
 
 
checkpoint-3000/optimizer.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:0ff8e5977667fc938b297528391c931889487050b2acf34a78a42a820912cd38
3
- size 32180124005
 
 
 
 
checkpoint-3000/pytorch_model_fsdp.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:3023a52ce183c0d2cddf839ebf937f5047e153db9c651eb9f295b9a386e6b589
3
- size 17645897996
 
 
 
 
checkpoint-3000/rng_state_0.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:61e957b4cd785256be4cb26eb03060ef689e1d58f1766d7f26ca36a62bec4994
3
- size 14917
 
 
 
 
checkpoint-3000/rng_state_1.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:550c54d430b44b77b0abe44c6e3ceba90a155305315c081b7616b35e2c18d1ce
3
- size 14917
 
 
 
 
checkpoint-3000/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:b07c9eca675fb8c47d0c01728c4ef879c66a752ffdace85e7e9feac32b48ac4b
3
- size 1465
 
 
 
 
checkpoint-3000/tokenizer.json DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:be75606093db2094d7cd20f3c2f385c212750648bd6ea4fb2bf507a6a4c55506
3
- size 11422650
 
 
 
 
checkpoint-3000/tokenizer_config.json DELETED
@@ -1,29 +0,0 @@
1
- {
2
- "add_prefix_space": false,
3
- "backend": "tokenizers",
4
- "bos_token": null,
5
- "clean_up_tokenization_spaces": false,
6
- "eos_token": "<|im_end|>",
7
- "errors": "replace",
8
- "extra_special_tokens": [
9
- "<|im_start|>",
10
- "<|im_end|>",
11
- "<|object_ref_start|>",
12
- "<|object_ref_end|>",
13
- "<|box_start|>",
14
- "<|box_end|>",
15
- "<|quad_start|>",
16
- "<|quad_end|>",
17
- "<|vision_start|>",
18
- "<|vision_end|>",
19
- "<|vision_pad|>",
20
- "<|image_pad|>",
21
- "<|video_pad|>"
22
- ],
23
- "is_local": false,
24
- "model_max_length": 1010000,
25
- "pad_token": "<|fim_pad|>",
26
- "split_special_tokens": false,
27
- "tokenizer_class": "Qwen2Tokenizer",
28
- "unk_token": null
29
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-3000/trainer_state.json DELETED
The diff for this file is too large to render. See raw diff
 
checkpoint-3000/training_args.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:bb9e429a6dba8782c1beb1411b31fa91f0c01ec6e0b1441e21d679f8a8b2c021
3
- size 6225
 
 
 
 
checkpoint-3948/chat_template.jinja DELETED
@@ -1,61 +0,0 @@
1
- {%- if tools %}
2
- {{- '<|im_start|>system\n' }}
3
- {%- if messages[0].role == 'system' %}
4
- {{- messages[0].content + '\n\n' }}
5
- {%- endif %}
6
- {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
7
- {%- for tool in tools %}
8
- {{- "\n" }}
9
- {{- tool | tojson }}
10
- {%- endfor %}
11
- {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
12
- {%- else %}
13
- {%- if messages[0].role == 'system' %}
14
- {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
15
- {%- endif %}
16
- {%- endif %}
17
- {%- for message in messages %}
18
- {%- if message.content is string %}
19
- {%- set content = message.content %}
20
- {%- else %}
21
- {%- set content = '' %}
22
- {%- endif %}
23
- {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
24
- {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
25
- {%- elif message.role == "assistant" %}
26
- {{- '<|im_start|>' + message.role + '\n' + content }}
27
- {%- if message.tool_calls %}
28
- {%- for tool_call in message.tool_calls %}
29
- {%- if (loop.first and content) or (not loop.first) %}
30
- {{- '\n' }}
31
- {%- endif %}
32
- {%- if tool_call.function %}
33
- {%- set tool_call = tool_call.function %}
34
- {%- endif %}
35
- {{- '<tool_call>\n{"name": "' }}
36
- {{- tool_call.name }}
37
- {{- '", "arguments": ' }}
38
- {%- if tool_call.arguments is string %}
39
- {{- tool_call.arguments }}
40
- {%- else %}
41
- {{- tool_call.arguments | tojson }}
42
- {%- endif %}
43
- {{- '}\n</tool_call>' }}
44
- {%- endfor %}
45
- {%- endif %}
46
- {{- '<|im_end|>\n' }}
47
- {%- elif message.role == "tool" %}
48
- {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
49
- {{- '<|im_start|>user' }}
50
- {%- endif %}
51
- {{- '\n<tool_response>\n' }}
52
- {{- content }}
53
- {{- '\n</tool_response>' }}
54
- {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
55
- {{- '<|im_end|>\n' }}
56
- {%- endif %}
57
- {%- endif %}
58
- {%- endfor %}
59
- {%- if add_generation_prompt %}
60
- {{- '<|im_start|>assistant\n' }}
61
- {%- endif %}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-3948/config.json DELETED
@@ -1,71 +0,0 @@
1
- {
2
- "architectures": [
3
- "Qwen3ForCausalLM"
4
- ],
5
- "attention_bias": false,
6
- "attention_dropout": 0.0,
7
- "bos_token_id": null,
8
- "dtype": "float32",
9
- "eos_token_id": 151645,
10
- "head_dim": 128,
11
- "hidden_act": "silu",
12
- "hidden_size": 2560,
13
- "initializer_range": 0.02,
14
- "intermediate_size": 9728,
15
- "layer_types": [
16
- "full_attention",
17
- "full_attention",
18
- "full_attention",
19
- "full_attention",
20
- "full_attention",
21
- "full_attention",
22
- "full_attention",
23
- "full_attention",
24
- "full_attention",
25
- "full_attention",
26
- "full_attention",
27
- "full_attention",
28
- "full_attention",
29
- "full_attention",
30
- "full_attention",
31
- "full_attention",
32
- "full_attention",
33
- "full_attention",
34
- "full_attention",
35
- "full_attention",
36
- "full_attention",
37
- "full_attention",
38
- "full_attention",
39
- "full_attention",
40
- "full_attention",
41
- "full_attention",
42
- "full_attention",
43
- "full_attention",
44
- "full_attention",
45
- "full_attention",
46
- "full_attention",
47
- "full_attention",
48
- "full_attention",
49
- "full_attention",
50
- "full_attention",
51
- "full_attention"
52
- ],
53
- "max_position_embeddings": 262144,
54
- "max_window_layers": 36,
55
- "model_type": "qwen3",
56
- "num_attention_heads": 32,
57
- "num_hidden_layers": 36,
58
- "num_key_value_heads": 8,
59
- "pad_token_id": 151662,
60
- "rms_norm_eps": 1e-06,
61
- "rope_parameters": {
62
- "rope_theta": 5000000,
63
- "rope_type": "default"
64
- },
65
- "sliding_window": null,
66
- "tie_word_embeddings": true,
67
- "transformers_version": "5.5.3",
68
- "use_cache": false,
69
- "use_sliding_window": false,
70
- "vocab_size": 151936
71
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-3948/generation_config.json DELETED
@@ -1,12 +0,0 @@
1
- {
2
- "do_sample": true,
3
- "eos_token_id": [
4
- 151645,
5
- 151643
6
- ],
7
- "pad_token_id": 151662,
8
- "temperature": 0.7,
9
- "top_k": 20,
10
- "top_p": 0.8,
11
- "transformers_version": "5.5.3"
12
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
checkpoint-3948/model.safetensors DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:e7db19800bbcf792dcb25dea9b5ae39f4e934a0d56f64ed6f74d7d89e87ae928
3
- size 17645743048
 
 
 
 
checkpoint-3948/optimizer.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:656d334c407ae1443fcaeda271d597e51249875fdde8e1a12a024812f6de73ab
3
- size 32180124005
 
 
 
 
checkpoint-3948/pytorch_model_fsdp.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:51d19fbc90bb938bf3c747a8b9c2b23f00398029d4ab146ca0ca0a0ea7d8885c
3
- size 17645897996
 
 
 
 
checkpoint-3948/rng_state_0.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:61e957b4cd785256be4cb26eb03060ef689e1d58f1766d7f26ca36a62bec4994
3
- size 14917
 
 
 
 
checkpoint-3948/rng_state_1.pth DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:550c54d430b44b77b0abe44c6e3ceba90a155305315c081b7616b35e2c18d1ce
3
- size 14917
 
 
 
 
checkpoint-3948/scheduler.pt DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:deaab1725fa5d6abb332a09b31b7c4d93808c0289cb39a32cd5102547b98e285
3
- size 1465