geonmin-kim commited on
Commit
9537a09
1 Parent(s): dd42d6e

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
added_tokens.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "\t\t": 50294,
3
+ "\t\t\t": 50293,
4
+ "\t\t\t\t": 50292,
5
+ "\t\t\t\t\t": 50291,
6
+ "\t\t\t\t\t\t": 50290,
7
+ "\t\t\t\t\t\t\t": 50289,
8
+ "\t\t\t\t\t\t\t\t": 50288,
9
+ "\t\t\t\t\t\t\t\t\t": 50287,
10
+ " ": 50286,
11
+ " ": 50285,
12
+ " ": 50284,
13
+ " ": 50283,
14
+ " ": 50282,
15
+ " ": 50281,
16
+ " ": 50280,
17
+ " ": 50279,
18
+ " ": 50278,
19
+ " ": 50277,
20
+ " ": 50276,
21
+ " ": 50275,
22
+ " ": 50274,
23
+ " ": 50273,
24
+ " ": 50272,
25
+ " ": 50271,
26
+ " ": 50270,
27
+ " ": 50269,
28
+ " ": 50268,
29
+ " ": 50267,
30
+ " ": 50266,
31
+ " ": 50265,
32
+ " ": 50264,
33
+ " ": 50263,
34
+ " ": 50262,
35
+ " ": 50261,
36
+ " ": 50260,
37
+ " ": 50259,
38
+ " ": 50258,
39
+ " ": 50257
40
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
mlc-chat-config.json ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "0.1.0",
3
+ "model_type": "phi",
4
+ "quantization": "q0f16",
5
+ "model_config": {
6
+ "vocab_size": 51200,
7
+ "hidden_size": 2560,
8
+ "intermediate_size": 10240,
9
+ "num_hidden_layers": 20,
10
+ "num_attention_heads": 32,
11
+ "layer_norm_eps": 1e-05,
12
+ "position_embedding_base": 10000.0,
13
+ "partial_rotary_factor": 0.4,
14
+ "num_key_value_heads": 32,
15
+ "context_window_size": 2048,
16
+ "prefill_chunk_size": 2048,
17
+ "head_dim": 80,
18
+ "tensor_parallel_shards": 1,
19
+ "max_batch_size": 128
20
+ },
21
+ "vocab_size": 51200,
22
+ "context_window_size": 2048,
23
+ "sliding_window_size": -1,
24
+ "prefill_chunk_size": 2048,
25
+ "attention_sink_size": -1,
26
+ "tensor_parallel_shards": 1,
27
+ "pipeline_parallel_stages": 1,
28
+ "temperature": 1.0,
29
+ "presence_penalty": 0.0,
30
+ "frequency_penalty": 0.0,
31
+ "repetition_penalty": 1.0,
32
+ "top_p": 1.0,
33
+ "tokenizer_files": [
34
+ "tokenizer.json",
35
+ "merges.txt",
36
+ "added_tokens.json",
37
+ "tokenizer_config.json"
38
+ ],
39
+ "tokenizer_info": {
40
+ "token_postproc_method": "byte_level",
41
+ "prepend_space_in_encode": false,
42
+ "strip_space_in_decode": false
43
+ },
44
+ "conv_template": {
45
+ "name": "phi-2",
46
+ "system_template": "{system_message}",
47
+ "system_message": "",
48
+ "system_prefix_token_ids": null,
49
+ "add_role_after_system_message": true,
50
+ "roles": {
51
+ "user": "Instruct",
52
+ "assistant": "Output"
53
+ },
54
+ "role_templates": {
55
+ "user": "{user_message}",
56
+ "assistant": "{assistant_message}",
57
+ "tool": "{tool_message}"
58
+ },
59
+ "messages": [],
60
+ "seps": [
61
+ "\n"
62
+ ],
63
+ "role_content_sep": ": ",
64
+ "role_empty_sep": ":",
65
+ "stop_str": [
66
+ "<|endoftext|>"
67
+ ],
68
+ "stop_token_ids": [
69
+ 50256
70
+ ],
71
+ "function_string": "",
72
+ "use_function_calling": false
73
+ },
74
+ "pad_token_id": 0,
75
+ "bos_token_id": 50256,
76
+ "eos_token_id": 50256
77
+ }
ndarray-cache.json ADDED
@@ -0,0 +1,2717 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 205,
4
+ "ParamBytes": 3671255040.0,
5
+ "BitsPerParam": 16.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 262144000,
12
+ "records": [
13
+ {
14
+ "name": "lm_head.linear.weight",
15
+ "shape": [
16
+ 51200,
17
+ 2560
18
+ ],
19
+ "dtype": "float16",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 262144000,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "f94243661c5a87b38f3bb41b70d6ebf5"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 52428800,
31
+ "records": [
32
+ {
33
+ "name": "transformer.h.14.mlp.fc1.weight",
34
+ "shape": [
35
+ 10240,
36
+ 2560
37
+ ],
38
+ "dtype": "float16",
39
+ "format": "f32-to-bf16",
40
+ "nbytes": 52428800,
41
+ "byteOffset": 0
42
+ }
43
+ ],
44
+ "md5sum": "b13cdf199fa41a6015318c1e9563b5d3"
45
+ },
46
+ {
47
+ "dataPath": "params_shard_2.bin",
48
+ "format": "raw-shard",
49
+ "nbytes": 52428800,
50
+ "records": [
51
+ {
52
+ "name": "transformer.h.14.mlp.fc2.weight",
53
+ "shape": [
54
+ 2560,
55
+ 10240
56
+ ],
57
+ "dtype": "float16",
58
+ "format": "f32-to-bf16",
59
+ "nbytes": 52428800,
60
+ "byteOffset": 0
61
+ }
62
+ ],
63
+ "md5sum": "75014792943f75c804cb504bf349ad15"
64
+ },
65
+ {
66
+ "dataPath": "params_shard_3.bin",
67
+ "format": "raw-shard",
68
+ "nbytes": 39321600,
69
+ "records": [
70
+ {
71
+ "name": "transformer.h.14.mixer.Wqkv.weight",
72
+ "shape": [
73
+ 7680,
74
+ 2560
75
+ ],
76
+ "dtype": "float16",
77
+ "format": "f32-to-bf16",
78
+ "nbytes": 39321600,
79
+ "byteOffset": 0
80
+ }
81
+ ],
82
+ "md5sum": "d66bbc9ff5c2c4456695dca08d90d680"
83
+ },
84
+ {
85
+ "dataPath": "params_shard_4.bin",
86
+ "format": "raw-shard",
87
+ "nbytes": 52428800,
88
+ "records": [
89
+ {
90
+ "name": "transformer.h.15.mlp.fc1.weight",
91
+ "shape": [
92
+ 10240,
93
+ 2560
94
+ ],
95
+ "dtype": "float16",
96
+ "format": "f32-to-bf16",
97
+ "nbytes": 52428800,
98
+ "byteOffset": 0
99
+ }
100
+ ],
101
+ "md5sum": "3ab6bceeac17f3a1ebe76be2153ccc33"
102
+ },
103
+ {
104
+ "dataPath": "params_shard_5.bin",
105
+ "format": "raw-shard",
106
+ "nbytes": 52428800,
107
+ "records": [
108
+ {
109
+ "name": "transformer.h.15.mlp.fc2.weight",
110
+ "shape": [
111
+ 2560,
112
+ 10240
113
+ ],
114
+ "dtype": "float16",
115
+ "format": "f32-to-bf16",
116
+ "nbytes": 52428800,
117
+ "byteOffset": 0
118
+ }
119
+ ],
120
+ "md5sum": "cfc960c8f3e40f3f8984b619b578c8f4"
121
+ },
122
+ {
123
+ "dataPath": "params_shard_6.bin",
124
+ "format": "raw-shard",
125
+ "nbytes": 39321600,
126
+ "records": [
127
+ {
128
+ "name": "transformer.h.15.mixer.Wqkv.weight",
129
+ "shape": [
130
+ 7680,
131
+ 2560
132
+ ],
133
+ "dtype": "float16",
134
+ "format": "f32-to-bf16",
135
+ "nbytes": 39321600,
136
+ "byteOffset": 0
137
+ }
138
+ ],
139
+ "md5sum": "a1f896b5733dda3e41688a8aa5d48f18"
140
+ },
141
+ {
142
+ "dataPath": "params_shard_7.bin",
143
+ "format": "raw-shard",
144
+ "nbytes": 52428800,
145
+ "records": [
146
+ {
147
+ "name": "transformer.h.16.mlp.fc1.weight",
148
+ "shape": [
149
+ 10240,
150
+ 2560
151
+ ],
152
+ "dtype": "float16",
153
+ "format": "f32-to-bf16",
154
+ "nbytes": 52428800,
155
+ "byteOffset": 0
156
+ }
157
+ ],
158
+ "md5sum": "86a3e23a912734667d74787760b4cf14"
159
+ },
160
+ {
161
+ "dataPath": "params_shard_8.bin",
162
+ "format": "raw-shard",
163
+ "nbytes": 52428800,
164
+ "records": [
165
+ {
166
+ "name": "transformer.h.16.mlp.fc2.weight",
167
+ "shape": [
168
+ 2560,
169
+ 10240
170
+ ],
171
+ "dtype": "float16",
172
+ "format": "f32-to-bf16",
173
+ "nbytes": 52428800,
174
+ "byteOffset": 0
175
+ }
176
+ ],
177
+ "md5sum": "913b689959d1363a8fb2ca832deb7fb8"
178
+ },
179
+ {
180
+ "dataPath": "params_shard_9.bin",
181
+ "format": "raw-shard",
182
+ "nbytes": 26480640,
183
+ "records": [
184
+ {
185
+ "name": "lm_head.linear.bias",
186
+ "shape": [
187
+ 51200
188
+ ],
189
+ "dtype": "float16",
190
+ "format": "f32-to-bf16",
191
+ "nbytes": 102400,
192
+ "byteOffset": 0
193
+ },
194
+ {
195
+ "name": "lm_head.ln.bias",
196
+ "shape": [
197
+ 2560
198
+ ],
199
+ "dtype": "float16",
200
+ "format": "f32-to-bf16",
201
+ "nbytes": 5120,
202
+ "byteOffset": 102400
203
+ },
204
+ {
205
+ "name": "lm_head.ln.weight",
206
+ "shape": [
207
+ 2560
208
+ ],
209
+ "dtype": "float16",
210
+ "format": "f32-to-bf16",
211
+ "nbytes": 5120,
212
+ "byteOffset": 107520
213
+ },
214
+ {
215
+ "name": "transformer.h.14.ln.bias",
216
+ "shape": [
217
+ 2560
218
+ ],
219
+ "dtype": "float16",
220
+ "format": "f32-to-bf16",
221
+ "nbytes": 5120,
222
+ "byteOffset": 112640
223
+ },
224
+ {
225
+ "name": "transformer.h.14.ln.weight",
226
+ "shape": [
227
+ 2560
228
+ ],
229
+ "dtype": "float16",
230
+ "format": "f32-to-bf16",
231
+ "nbytes": 5120,
232
+ "byteOffset": 117760
233
+ },
234
+ {
235
+ "name": "transformer.h.14.mlp.fc1.bias",
236
+ "shape": [
237
+ 10240
238
+ ],
239
+ "dtype": "float16",
240
+ "format": "f32-to-bf16",
241
+ "nbytes": 20480,
242
+ "byteOffset": 122880
243
+ },
244
+ {
245
+ "name": "transformer.h.14.mlp.fc2.bias",
246
+ "shape": [
247
+ 2560
248
+ ],
249
+ "dtype": "float16",
250
+ "format": "f32-to-bf16",
251
+ "nbytes": 5120,
252
+ "byteOffset": 143360
253
+ },
254
+ {
255
+ "name": "transformer.h.14.mixer.out_proj.bias",
256
+ "shape": [
257
+ 2560
258
+ ],
259
+ "dtype": "float16",
260
+ "format": "f32-to-bf16",
261
+ "nbytes": 5120,
262
+ "byteOffset": 148480
263
+ },
264
+ {
265
+ "name": "transformer.h.14.mixer.out_proj.weight",
266
+ "shape": [
267
+ 2560,
268
+ 2560
269
+ ],
270
+ "dtype": "float16",
271
+ "format": "f32-to-bf16",
272
+ "nbytes": 13107200,
273
+ "byteOffset": 153600
274
+ },
275
+ {
276
+ "name": "transformer.h.14.mixer.Wqkv.bias",
277
+ "shape": [
278
+ 7680
279
+ ],
280
+ "dtype": "float16",
281
+ "format": "f32-to-bf16",
282
+ "nbytes": 15360,
283
+ "byteOffset": 13260800
284
+ },
285
+ {
286
+ "name": "transformer.h.15.ln.bias",
287
+ "shape": [
288
+ 2560
289
+ ],
290
+ "dtype": "float16",
291
+ "format": "f32-to-bf16",
292
+ "nbytes": 5120,
293
+ "byteOffset": 13276160
294
+ },
295
+ {
296
+ "name": "transformer.h.15.ln.weight",
297
+ "shape": [
298
+ 2560
299
+ ],
300
+ "dtype": "float16",
301
+ "format": "f32-to-bf16",
302
+ "nbytes": 5120,
303
+ "byteOffset": 13281280
304
+ },
305
+ {
306
+ "name": "transformer.h.15.mlp.fc1.bias",
307
+ "shape": [
308
+ 10240
309
+ ],
310
+ "dtype": "float16",
311
+ "format": "f32-to-bf16",
312
+ "nbytes": 20480,
313
+ "byteOffset": 13286400
314
+ },
315
+ {
316
+ "name": "transformer.h.15.mlp.fc2.bias",
317
+ "shape": [
318
+ 2560
319
+ ],
320
+ "dtype": "float16",
321
+ "format": "f32-to-bf16",
322
+ "nbytes": 5120,
323
+ "byteOffset": 13306880
324
+ },
325
+ {
326
+ "name": "transformer.h.15.mixer.out_proj.bias",
327
+ "shape": [
328
+ 2560
329
+ ],
330
+ "dtype": "float16",
331
+ "format": "f32-to-bf16",
332
+ "nbytes": 5120,
333
+ "byteOffset": 13312000
334
+ },
335
+ {
336
+ "name": "transformer.h.15.mixer.out_proj.weight",
337
+ "shape": [
338
+ 2560,
339
+ 2560
340
+ ],
341
+ "dtype": "float16",
342
+ "format": "f32-to-bf16",
343
+ "nbytes": 13107200,
344
+ "byteOffset": 13317120
345
+ },
346
+ {
347
+ "name": "transformer.h.15.mixer.Wqkv.bias",
348
+ "shape": [
349
+ 7680
350
+ ],
351
+ "dtype": "float16",
352
+ "format": "f32-to-bf16",
353
+ "nbytes": 15360,
354
+ "byteOffset": 26424320
355
+ },
356
+ {
357
+ "name": "transformer.h.16.ln.bias",
358
+ "shape": [
359
+ 2560
360
+ ],
361
+ "dtype": "float16",
362
+ "format": "f32-to-bf16",
363
+ "nbytes": 5120,
364
+ "byteOffset": 26439680
365
+ },
366
+ {
367
+ "name": "transformer.h.16.ln.weight",
368
+ "shape": [
369
+ 2560
370
+ ],
371
+ "dtype": "float16",
372
+ "format": "f32-to-bf16",
373
+ "nbytes": 5120,
374
+ "byteOffset": 26444800
375
+ },
376
+ {
377
+ "name": "transformer.h.16.mlp.fc1.bias",
378
+ "shape": [
379
+ 10240
380
+ ],
381
+ "dtype": "float16",
382
+ "format": "f32-to-bf16",
383
+ "nbytes": 20480,
384
+ "byteOffset": 26449920
385
+ },
386
+ {
387
+ "name": "transformer.h.16.mlp.fc2.bias",
388
+ "shape": [
389
+ 2560
390
+ ],
391
+ "dtype": "float16",
392
+ "format": "f32-to-bf16",
393
+ "nbytes": 5120,
394
+ "byteOffset": 26470400
395
+ },
396
+ {
397
+ "name": "transformer.h.16.mixer.out_proj.bias",
398
+ "shape": [
399
+ 2560
400
+ ],
401
+ "dtype": "float16",
402
+ "format": "f32-to-bf16",
403
+ "nbytes": 5120,
404
+ "byteOffset": 26475520
405
+ }
406
+ ],
407
+ "md5sum": "f53b583f02b20cdec88c289ac1360c67"
408
+ },
409
+ {
410
+ "dataPath": "params_shard_10.bin",
411
+ "format": "raw-shard",
412
+ "nbytes": 39321600,
413
+ "records": [
414
+ {
415
+ "name": "transformer.h.16.mixer.Wqkv.weight",
416
+ "shape": [
417
+ 7680,
418
+ 2560
419
+ ],
420
+ "dtype": "float16",
421
+ "format": "f32-to-bf16",
422
+ "nbytes": 39321600,
423
+ "byteOffset": 0
424
+ }
425
+ ],
426
+ "md5sum": "1f2c12270558403bacbf455beac632f2"
427
+ },
428
+ {
429
+ "dataPath": "params_shard_11.bin",
430
+ "format": "raw-shard",
431
+ "nbytes": 52428800,
432
+ "records": [
433
+ {
434
+ "name": "transformer.h.17.mlp.fc1.weight",
435
+ "shape": [
436
+ 10240,
437
+ 2560
438
+ ],
439
+ "dtype": "float16",
440
+ "format": "f32-to-bf16",
441
+ "nbytes": 52428800,
442
+ "byteOffset": 0
443
+ }
444
+ ],
445
+ "md5sum": "4f031084eb0520ba0155a8266460af39"
446
+ },
447
+ {
448
+ "dataPath": "params_shard_12.bin",
449
+ "format": "raw-shard",
450
+ "nbytes": 52428800,
451
+ "records": [
452
+ {
453
+ "name": "transformer.h.17.mlp.fc2.weight",
454
+ "shape": [
455
+ 2560,
456
+ 10240
457
+ ],
458
+ "dtype": "float16",
459
+ "format": "f32-to-bf16",
460
+ "nbytes": 52428800,
461
+ "byteOffset": 0
462
+ }
463
+ ],
464
+ "md5sum": "1ec43d399b3b2c7a786a58842d7cc747"
465
+ },
466
+ {
467
+ "dataPath": "params_shard_13.bin",
468
+ "format": "raw-shard",
469
+ "nbytes": 39321600,
470
+ "records": [
471
+ {
472
+ "name": "transformer.h.17.mixer.Wqkv.weight",
473
+ "shape": [
474
+ 7680,
475
+ 2560
476
+ ],
477
+ "dtype": "float16",
478
+ "format": "f32-to-bf16",
479
+ "nbytes": 39321600,
480
+ "byteOffset": 0
481
+ }
482
+ ],
483
+ "md5sum": "8d260f0b99188c3d223ab16d48d1c2bc"
484
+ },
485
+ {
486
+ "dataPath": "params_shard_14.bin",
487
+ "format": "raw-shard",
488
+ "nbytes": 52428800,
489
+ "records": [
490
+ {
491
+ "name": "transformer.h.18.mlp.fc1.weight",
492
+ "shape": [
493
+ 10240,
494
+ 2560
495
+ ],
496
+ "dtype": "float16",
497
+ "format": "f32-to-bf16",
498
+ "nbytes": 52428800,
499
+ "byteOffset": 0
500
+ }
501
+ ],
502
+ "md5sum": "e2c988cf74aeb2ade3a874cb7c29d95c"
503
+ },
504
+ {
505
+ "dataPath": "params_shard_15.bin",
506
+ "format": "raw-shard",
507
+ "nbytes": 52428800,
508
+ "records": [
509
+ {
510
+ "name": "transformer.h.18.mlp.fc2.weight",
511
+ "shape": [
512
+ 2560,
513
+ 10240
514
+ ],
515
+ "dtype": "float16",
516
+ "format": "f32-to-bf16",
517
+ "nbytes": 52428800,
518
+ "byteOffset": 0
519
+ }
520
+ ],
521
+ "md5sum": "5397e400be97efddd4b1e13600a0b70c"
522
+ },
523
+ {
524
+ "dataPath": "params_shard_16.bin",
525
+ "format": "raw-shard",
526
+ "nbytes": 26327040,
527
+ "records": [
528
+ {
529
+ "name": "transformer.h.16.mixer.out_proj.weight",
530
+ "shape": [
531
+ 2560,
532
+ 2560
533
+ ],
534
+ "dtype": "float16",
535
+ "format": "f32-to-bf16",
536
+ "nbytes": 13107200,
537
+ "byteOffset": 0
538
+ },
539
+ {
540
+ "name": "transformer.h.16.mixer.Wqkv.bias",
541
+ "shape": [
542
+ 7680
543
+ ],
544
+ "dtype": "float16",
545
+ "format": "f32-to-bf16",
546
+ "nbytes": 15360,
547
+ "byteOffset": 13107200
548
+ },
549
+ {
550
+ "name": "transformer.h.17.ln.bias",
551
+ "shape": [
552
+ 2560
553
+ ],
554
+ "dtype": "float16",
555
+ "format": "f32-to-bf16",
556
+ "nbytes": 5120,
557
+ "byteOffset": 13122560
558
+ },
559
+ {
560
+ "name": "transformer.h.17.ln.weight",
561
+ "shape": [
562
+ 2560
563
+ ],
564
+ "dtype": "float16",
565
+ "format": "f32-to-bf16",
566
+ "nbytes": 5120,
567
+ "byteOffset": 13127680
568
+ },
569
+ {
570
+ "name": "transformer.h.17.mlp.fc1.bias",
571
+ "shape": [
572
+ 10240
573
+ ],
574
+ "dtype": "float16",
575
+ "format": "f32-to-bf16",
576
+ "nbytes": 20480,
577
+ "byteOffset": 13132800
578
+ },
579
+ {
580
+ "name": "transformer.h.17.mlp.fc2.bias",
581
+ "shape": [
582
+ 2560
583
+ ],
584
+ "dtype": "float16",
585
+ "format": "f32-to-bf16",
586
+ "nbytes": 5120,
587
+ "byteOffset": 13153280
588
+ },
589
+ {
590
+ "name": "transformer.h.17.mixer.out_proj.bias",
591
+ "shape": [
592
+ 2560
593
+ ],
594
+ "dtype": "float16",
595
+ "format": "f32-to-bf16",
596
+ "nbytes": 5120,
597
+ "byteOffset": 13158400
598
+ },
599
+ {
600
+ "name": "transformer.h.17.mixer.out_proj.weight",
601
+ "shape": [
602
+ 2560,
603
+ 2560
604
+ ],
605
+ "dtype": "float16",
606
+ "format": "f32-to-bf16",
607
+ "nbytes": 13107200,
608
+ "byteOffset": 13163520
609
+ },
610
+ {
611
+ "name": "transformer.h.17.mixer.Wqkv.bias",
612
+ "shape": [
613
+ 7680
614
+ ],
615
+ "dtype": "float16",
616
+ "format": "f32-to-bf16",
617
+ "nbytes": 15360,
618
+ "byteOffset": 26270720
619
+ },
620
+ {
621
+ "name": "transformer.h.18.ln.bias",
622
+ "shape": [
623
+ 2560
624
+ ],
625
+ "dtype": "float16",
626
+ "format": "f32-to-bf16",
627
+ "nbytes": 5120,
628
+ "byteOffset": 26286080
629
+ },
630
+ {
631
+ "name": "transformer.h.18.ln.weight",
632
+ "shape": [
633
+ 2560
634
+ ],
635
+ "dtype": "float16",
636
+ "format": "f32-to-bf16",
637
+ "nbytes": 5120,
638
+ "byteOffset": 26291200
639
+ },
640
+ {
641
+ "name": "transformer.h.18.mlp.fc1.bias",
642
+ "shape": [
643
+ 10240
644
+ ],
645
+ "dtype": "float16",
646
+ "format": "f32-to-bf16",
647
+ "nbytes": 20480,
648
+ "byteOffset": 26296320
649
+ },
650
+ {
651
+ "name": "transformer.h.18.mlp.fc2.bias",
652
+ "shape": [
653
+ 2560
654
+ ],
655
+ "dtype": "float16",
656
+ "format": "f32-to-bf16",
657
+ "nbytes": 5120,
658
+ "byteOffset": 26316800
659
+ },
660
+ {
661
+ "name": "transformer.h.18.mixer.out_proj.bias",
662
+ "shape": [
663
+ 2560
664
+ ],
665
+ "dtype": "float16",
666
+ "format": "f32-to-bf16",
667
+ "nbytes": 5120,
668
+ "byteOffset": 26321920
669
+ }
670
+ ],
671
+ "md5sum": "f8a2d03403eb9e9cf7b9aa78181106b7"
672
+ },
673
+ {
674
+ "dataPath": "params_shard_17.bin",
675
+ "format": "raw-shard",
676
+ "nbytes": 39321600,
677
+ "records": [
678
+ {
679
+ "name": "transformer.h.18.mixer.Wqkv.weight",
680
+ "shape": [
681
+ 7680,
682
+ 2560
683
+ ],
684
+ "dtype": "float16",
685
+ "format": "f32-to-bf16",
686
+ "nbytes": 39321600,
687
+ "byteOffset": 0
688
+ }
689
+ ],
690
+ "md5sum": "8f835d1be1a32e017fa3d6140eb46b44"
691
+ },
692
+ {
693
+ "dataPath": "params_shard_18.bin",
694
+ "format": "raw-shard",
695
+ "nbytes": 52428800,
696
+ "records": [
697
+ {
698
+ "name": "transformer.h.19.mlp.fc1.weight",
699
+ "shape": [
700
+ 10240,
701
+ 2560
702
+ ],
703
+ "dtype": "float16",
704
+ "format": "f32-to-bf16",
705
+ "nbytes": 52428800,
706
+ "byteOffset": 0
707
+ }
708
+ ],
709
+ "md5sum": "eb1b239145a55d4c181f83ef0fed1959"
710
+ },
711
+ {
712
+ "dataPath": "params_shard_19.bin",
713
+ "format": "raw-shard",
714
+ "nbytes": 52428800,
715
+ "records": [
716
+ {
717
+ "name": "transformer.h.19.mlp.fc2.weight",
718
+ "shape": [
719
+ 2560,
720
+ 10240
721
+ ],
722
+ "dtype": "float16",
723
+ "format": "f32-to-bf16",
724
+ "nbytes": 52428800,
725
+ "byteOffset": 0
726
+ }
727
+ ],
728
+ "md5sum": "1a2ebc4b8383975be62cb644e326db04"
729
+ },
730
+ {
731
+ "dataPath": "params_shard_20.bin",
732
+ "format": "raw-shard",
733
+ "nbytes": 39321600,
734
+ "records": [
735
+ {
736
+ "name": "transformer.h.19.mixer.Wqkv.weight",
737
+ "shape": [
738
+ 7680,
739
+ 2560
740
+ ],
741
+ "dtype": "float16",
742
+ "format": "f32-to-bf16",
743
+ "nbytes": 39321600,
744
+ "byteOffset": 0
745
+ }
746
+ ],
747
+ "md5sum": "4dde6f985823ec281ab0ce5d2c2f4164"
748
+ },
749
+ {
750
+ "dataPath": "params_shard_21.bin",
751
+ "format": "raw-shard",
752
+ "nbytes": 262144000,
753
+ "records": [
754
+ {
755
+ "name": "transformer.embd.weight",
756
+ "shape": [
757
+ 51200,
758
+ 2560
759
+ ],
760
+ "dtype": "float16",
761
+ "format": "f32-to-bf16",
762
+ "nbytes": 262144000,
763
+ "byteOffset": 0
764
+ }
765
+ ],
766
+ "md5sum": "8e80be24cbe7c82aa854016c3950729e"
767
+ },
768
+ {
769
+ "dataPath": "params_shard_22.bin",
770
+ "format": "raw-shard",
771
+ "nbytes": 52428800,
772
+ "records": [
773
+ {
774
+ "name": "transformer.h.0.mlp.fc1.weight",
775
+ "shape": [
776
+ 10240,
777
+ 2560
778
+ ],
779
+ "dtype": "float16",
780
+ "format": "f32-to-bf16",
781
+ "nbytes": 52428800,
782
+ "byteOffset": 0
783
+ }
784
+ ],
785
+ "md5sum": "3cf77e681ad8fd29e0ce784823198e16"
786
+ },
787
+ {
788
+ "dataPath": "params_shard_23.bin",
789
+ "format": "raw-shard",
790
+ "nbytes": 52428800,
791
+ "records": [
792
+ {
793
+ "name": "transformer.h.0.mlp.fc2.weight",
794
+ "shape": [
795
+ 2560,
796
+ 10240
797
+ ],
798
+ "dtype": "float16",
799
+ "format": "f32-to-bf16",
800
+ "nbytes": 52428800,
801
+ "byteOffset": 0
802
+ }
803
+ ],
804
+ "md5sum": "e20e12c7a3dca65da2f968dc73087129"
805
+ },
806
+ {
807
+ "dataPath": "params_shard_24.bin",
808
+ "format": "raw-shard",
809
+ "nbytes": 26327040,
810
+ "records": [
811
+ {
812
+ "name": "transformer.h.18.mixer.out_proj.weight",
813
+ "shape": [
814
+ 2560,
815
+ 2560
816
+ ],
817
+ "dtype": "float16",
818
+ "format": "f32-to-bf16",
819
+ "nbytes": 13107200,
820
+ "byteOffset": 0
821
+ },
822
+ {
823
+ "name": "transformer.h.18.mixer.Wqkv.bias",
824
+ "shape": [
825
+ 7680
826
+ ],
827
+ "dtype": "float16",
828
+ "format": "f32-to-bf16",
829
+ "nbytes": 15360,
830
+ "byteOffset": 13107200
831
+ },
832
+ {
833
+ "name": "transformer.h.19.ln.bias",
834
+ "shape": [
835
+ 2560
836
+ ],
837
+ "dtype": "float16",
838
+ "format": "f32-to-bf16",
839
+ "nbytes": 5120,
840
+ "byteOffset": 13122560
841
+ },
842
+ {
843
+ "name": "transformer.h.19.ln.weight",
844
+ "shape": [
845
+ 2560
846
+ ],
847
+ "dtype": "float16",
848
+ "format": "f32-to-bf16",
849
+ "nbytes": 5120,
850
+ "byteOffset": 13127680
851
+ },
852
+ {
853
+ "name": "transformer.h.19.mlp.fc1.bias",
854
+ "shape": [
855
+ 10240
856
+ ],
857
+ "dtype": "float16",
858
+ "format": "f32-to-bf16",
859
+ "nbytes": 20480,
860
+ "byteOffset": 13132800
861
+ },
862
+ {
863
+ "name": "transformer.h.19.mlp.fc2.bias",
864
+ "shape": [
865
+ 2560
866
+ ],
867
+ "dtype": "float16",
868
+ "format": "f32-to-bf16",
869
+ "nbytes": 5120,
870
+ "byteOffset": 13153280
871
+ },
872
+ {
873
+ "name": "transformer.h.19.mixer.out_proj.bias",
874
+ "shape": [
875
+ 2560
876
+ ],
877
+ "dtype": "float16",
878
+ "format": "f32-to-bf16",
879
+ "nbytes": 5120,
880
+ "byteOffset": 13158400
881
+ },
882
+ {
883
+ "name": "transformer.h.19.mixer.out_proj.weight",
884
+ "shape": [
885
+ 2560,
886
+ 2560
887
+ ],
888
+ "dtype": "float16",
889
+ "format": "f32-to-bf16",
890
+ "nbytes": 13107200,
891
+ "byteOffset": 13163520
892
+ },
893
+ {
894
+ "name": "transformer.h.19.mixer.Wqkv.bias",
895
+ "shape": [
896
+ 7680
897
+ ],
898
+ "dtype": "float16",
899
+ "format": "f32-to-bf16",
900
+ "nbytes": 15360,
901
+ "byteOffset": 26270720
902
+ },
903
+ {
904
+ "name": "transformer.h.0.ln.bias",
905
+ "shape": [
906
+ 2560
907
+ ],
908
+ "dtype": "float16",
909
+ "format": "f32-to-bf16",
910
+ "nbytes": 5120,
911
+ "byteOffset": 26286080
912
+ },
913
+ {
914
+ "name": "transformer.h.0.ln.weight",
915
+ "shape": [
916
+ 2560
917
+ ],
918
+ "dtype": "float16",
919
+ "format": "f32-to-bf16",
920
+ "nbytes": 5120,
921
+ "byteOffset": 26291200
922
+ },
923
+ {
924
+ "name": "transformer.h.0.mlp.fc1.bias",
925
+ "shape": [
926
+ 10240
927
+ ],
928
+ "dtype": "float16",
929
+ "format": "f32-to-bf16",
930
+ "nbytes": 20480,
931
+ "byteOffset": 26296320
932
+ },
933
+ {
934
+ "name": "transformer.h.0.mlp.fc2.bias",
935
+ "shape": [
936
+ 2560
937
+ ],
938
+ "dtype": "float16",
939
+ "format": "f32-to-bf16",
940
+ "nbytes": 5120,
941
+ "byteOffset": 26316800
942
+ },
943
+ {
944
+ "name": "transformer.h.0.mixer.out_proj.bias",
945
+ "shape": [
946
+ 2560
947
+ ],
948
+ "dtype": "float16",
949
+ "format": "f32-to-bf16",
950
+ "nbytes": 5120,
951
+ "byteOffset": 26321920
952
+ }
953
+ ],
954
+ "md5sum": "9ac07740b35a8e13617dd0e709cb0d69"
955
+ },
956
+ {
957
+ "dataPath": "params_shard_25.bin",
958
+ "format": "raw-shard",
959
+ "nbytes": 39321600,
960
+ "records": [
961
+ {
962
+ "name": "transformer.h.0.mixer.Wqkv.weight",
963
+ "shape": [
964
+ 7680,
965
+ 2560
966
+ ],
967
+ "dtype": "float16",
968
+ "format": "f32-to-bf16",
969
+ "nbytes": 39321600,
970
+ "byteOffset": 0
971
+ }
972
+ ],
973
+ "md5sum": "686131149665cc509ab60acbda61725a"
974
+ },
975
+ {
976
+ "dataPath": "params_shard_26.bin",
977
+ "format": "raw-shard",
978
+ "nbytes": 52428800,
979
+ "records": [
980
+ {
981
+ "name": "transformer.h.1.mlp.fc1.weight",
982
+ "shape": [
983
+ 10240,
984
+ 2560
985
+ ],
986
+ "dtype": "float16",
987
+ "format": "f32-to-bf16",
988
+ "nbytes": 52428800,
989
+ "byteOffset": 0
990
+ }
991
+ ],
992
+ "md5sum": "7f924e11f5a49c3bdc4a18eaf7c49d5c"
993
+ },
994
+ {
995
+ "dataPath": "params_shard_27.bin",
996
+ "format": "raw-shard",
997
+ "nbytes": 52428800,
998
+ "records": [
999
+ {
1000
+ "name": "transformer.h.1.mlp.fc2.weight",
1001
+ "shape": [
1002
+ 2560,
1003
+ 10240
1004
+ ],
1005
+ "dtype": "float16",
1006
+ "format": "f32-to-bf16",
1007
+ "nbytes": 52428800,
1008
+ "byteOffset": 0
1009
+ }
1010
+ ],
1011
+ "md5sum": "25acdddd77a14c825123007c8a8c0f37"
1012
+ },
1013
+ {
1014
+ "dataPath": "params_shard_28.bin",
1015
+ "format": "raw-shard",
1016
+ "nbytes": 39321600,
1017
+ "records": [
1018
+ {
1019
+ "name": "transformer.h.1.mixer.Wqkv.weight",
1020
+ "shape": [
1021
+ 7680,
1022
+ 2560
1023
+ ],
1024
+ "dtype": "float16",
1025
+ "format": "f32-to-bf16",
1026
+ "nbytes": 39321600,
1027
+ "byteOffset": 0
1028
+ }
1029
+ ],
1030
+ "md5sum": "8686cab0ddf07e2d37ac73ffc4eaed45"
1031
+ },
1032
+ {
1033
+ "dataPath": "params_shard_29.bin",
1034
+ "format": "raw-shard",
1035
+ "nbytes": 52428800,
1036
+ "records": [
1037
+ {
1038
+ "name": "transformer.h.10.mlp.fc1.weight",
1039
+ "shape": [
1040
+ 10240,
1041
+ 2560
1042
+ ],
1043
+ "dtype": "float16",
1044
+ "format": "f32-to-bf16",
1045
+ "nbytes": 52428800,
1046
+ "byteOffset": 0
1047
+ }
1048
+ ],
1049
+ "md5sum": "364369ce587148d62299cfe8a72b8770"
1050
+ },
1051
+ {
1052
+ "dataPath": "params_shard_30.bin",
1053
+ "format": "raw-shard",
1054
+ "nbytes": 52428800,
1055
+ "records": [
1056
+ {
1057
+ "name": "transformer.h.10.mlp.fc2.weight",
1058
+ "shape": [
1059
+ 2560,
1060
+ 10240
1061
+ ],
1062
+ "dtype": "float16",
1063
+ "format": "f32-to-bf16",
1064
+ "nbytes": 52428800,
1065
+ "byteOffset": 0
1066
+ }
1067
+ ],
1068
+ "md5sum": "04cfa745317707b5a3054a4a2c7e5616"
1069
+ },
1070
+ {
1071
+ "dataPath": "params_shard_31.bin",
1072
+ "format": "raw-shard",
1073
+ "nbytes": 26327040,
1074
+ "records": [
1075
+ {
1076
+ "name": "transformer.h.0.mixer.out_proj.weight",
1077
+ "shape": [
1078
+ 2560,
1079
+ 2560
1080
+ ],
1081
+ "dtype": "float16",
1082
+ "format": "f32-to-bf16",
1083
+ "nbytes": 13107200,
1084
+ "byteOffset": 0
1085
+ },
1086
+ {
1087
+ "name": "transformer.h.0.mixer.Wqkv.bias",
1088
+ "shape": [
1089
+ 7680
1090
+ ],
1091
+ "dtype": "float16",
1092
+ "format": "f32-to-bf16",
1093
+ "nbytes": 15360,
1094
+ "byteOffset": 13107200
1095
+ },
1096
+ {
1097
+ "name": "transformer.h.1.ln.bias",
1098
+ "shape": [
1099
+ 2560
1100
+ ],
1101
+ "dtype": "float16",
1102
+ "format": "f32-to-bf16",
1103
+ "nbytes": 5120,
1104
+ "byteOffset": 13122560
1105
+ },
1106
+ {
1107
+ "name": "transformer.h.1.ln.weight",
1108
+ "shape": [
1109
+ 2560
1110
+ ],
1111
+ "dtype": "float16",
1112
+ "format": "f32-to-bf16",
1113
+ "nbytes": 5120,
1114
+ "byteOffset": 13127680
1115
+ },
1116
+ {
1117
+ "name": "transformer.h.1.mlp.fc1.bias",
1118
+ "shape": [
1119
+ 10240
1120
+ ],
1121
+ "dtype": "float16",
1122
+ "format": "f32-to-bf16",
1123
+ "nbytes": 20480,
1124
+ "byteOffset": 13132800
1125
+ },
1126
+ {
1127
+ "name": "transformer.h.1.mlp.fc2.bias",
1128
+ "shape": [
1129
+ 2560
1130
+ ],
1131
+ "dtype": "float16",
1132
+ "format": "f32-to-bf16",
1133
+ "nbytes": 5120,
1134
+ "byteOffset": 13153280
1135
+ },
1136
+ {
1137
+ "name": "transformer.h.1.mixer.out_proj.bias",
1138
+ "shape": [
1139
+ 2560
1140
+ ],
1141
+ "dtype": "float16",
1142
+ "format": "f32-to-bf16",
1143
+ "nbytes": 5120,
1144
+ "byteOffset": 13158400
1145
+ },
1146
+ {
1147
+ "name": "transformer.h.1.mixer.out_proj.weight",
1148
+ "shape": [
1149
+ 2560,
1150
+ 2560
1151
+ ],
1152
+ "dtype": "float16",
1153
+ "format": "f32-to-bf16",
1154
+ "nbytes": 13107200,
1155
+ "byteOffset": 13163520
1156
+ },
1157
+ {
1158
+ "name": "transformer.h.1.mixer.Wqkv.bias",
1159
+ "shape": [
1160
+ 7680
1161
+ ],
1162
+ "dtype": "float16",
1163
+ "format": "f32-to-bf16",
1164
+ "nbytes": 15360,
1165
+ "byteOffset": 26270720
1166
+ },
1167
+ {
1168
+ "name": "transformer.h.10.ln.bias",
1169
+ "shape": [
1170
+ 2560
1171
+ ],
1172
+ "dtype": "float16",
1173
+ "format": "f32-to-bf16",
1174
+ "nbytes": 5120,
1175
+ "byteOffset": 26286080
1176
+ },
1177
+ {
1178
+ "name": "transformer.h.10.ln.weight",
1179
+ "shape": [
1180
+ 2560
1181
+ ],
1182
+ "dtype": "float16",
1183
+ "format": "f32-to-bf16",
1184
+ "nbytes": 5120,
1185
+ "byteOffset": 26291200
1186
+ },
1187
+ {
1188
+ "name": "transformer.h.10.mlp.fc1.bias",
1189
+ "shape": [
1190
+ 10240
1191
+ ],
1192
+ "dtype": "float16",
1193
+ "format": "f32-to-bf16",
1194
+ "nbytes": 20480,
1195
+ "byteOffset": 26296320
1196
+ },
1197
+ {
1198
+ "name": "transformer.h.10.mlp.fc2.bias",
1199
+ "shape": [
1200
+ 2560
1201
+ ],
1202
+ "dtype": "float16",
1203
+ "format": "f32-to-bf16",
1204
+ "nbytes": 5120,
1205
+ "byteOffset": 26316800
1206
+ },
1207
+ {
1208
+ "name": "transformer.h.10.mixer.out_proj.bias",
1209
+ "shape": [
1210
+ 2560
1211
+ ],
1212
+ "dtype": "float16",
1213
+ "format": "f32-to-bf16",
1214
+ "nbytes": 5120,
1215
+ "byteOffset": 26321920
1216
+ }
1217
+ ],
1218
+ "md5sum": "f9ecd7131d396059066f14b928914dcf"
1219
+ },
1220
+ {
1221
+ "dataPath": "params_shard_32.bin",
1222
+ "format": "raw-shard",
1223
+ "nbytes": 39321600,
1224
+ "records": [
1225
+ {
1226
+ "name": "transformer.h.10.mixer.Wqkv.weight",
1227
+ "shape": [
1228
+ 7680,
1229
+ 2560
1230
+ ],
1231
+ "dtype": "float16",
1232
+ "format": "f32-to-bf16",
1233
+ "nbytes": 39321600,
1234
+ "byteOffset": 0
1235
+ }
1236
+ ],
1237
+ "md5sum": "7d90087370a2350e7ab1dacc145fe966"
1238
+ },
1239
+ {
1240
+ "dataPath": "params_shard_33.bin",
1241
+ "format": "raw-shard",
1242
+ "nbytes": 52428800,
1243
+ "records": [
1244
+ {
1245
+ "name": "transformer.h.11.mlp.fc1.weight",
1246
+ "shape": [
1247
+ 10240,
1248
+ 2560
1249
+ ],
1250
+ "dtype": "float16",
1251
+ "format": "f32-to-bf16",
1252
+ "nbytes": 52428800,
1253
+ "byteOffset": 0
1254
+ }
1255
+ ],
1256
+ "md5sum": "bba2afa39105559b6abbd88707220870"
1257
+ },
1258
+ {
1259
+ "dataPath": "params_shard_34.bin",
1260
+ "format": "raw-shard",
1261
+ "nbytes": 52428800,
1262
+ "records": [
1263
+ {
1264
+ "name": "transformer.h.11.mlp.fc2.weight",
1265
+ "shape": [
1266
+ 2560,
1267
+ 10240
1268
+ ],
1269
+ "dtype": "float16",
1270
+ "format": "f32-to-bf16",
1271
+ "nbytes": 52428800,
1272
+ "byteOffset": 0
1273
+ }
1274
+ ],
1275
+ "md5sum": "13bca4c465f23d066f8166398373c9fe"
1276
+ },
1277
+ {
1278
+ "dataPath": "params_shard_35.bin",
1279
+ "format": "raw-shard",
1280
+ "nbytes": 39321600,
1281
+ "records": [
1282
+ {
1283
+ "name": "transformer.h.11.mixer.Wqkv.weight",
1284
+ "shape": [
1285
+ 7680,
1286
+ 2560
1287
+ ],
1288
+ "dtype": "float16",
1289
+ "format": "f32-to-bf16",
1290
+ "nbytes": 39321600,
1291
+ "byteOffset": 0
1292
+ }
1293
+ ],
1294
+ "md5sum": "0d47f892b7c1e0ee1a3573d62a4e3dce"
1295
+ },
1296
+ {
1297
+ "dataPath": "params_shard_36.bin",
1298
+ "format": "raw-shard",
1299
+ "nbytes": 52428800,
1300
+ "records": [
1301
+ {
1302
+ "name": "transformer.h.12.mlp.fc1.weight",
1303
+ "shape": [
1304
+ 10240,
1305
+ 2560
1306
+ ],
1307
+ "dtype": "float16",
1308
+ "format": "f32-to-bf16",
1309
+ "nbytes": 52428800,
1310
+ "byteOffset": 0
1311
+ }
1312
+ ],
1313
+ "md5sum": "d3e5a7817c7bf9c81ed43f7cd2e1a82d"
1314
+ },
1315
+ {
1316
+ "dataPath": "params_shard_37.bin",
1317
+ "format": "raw-shard",
1318
+ "nbytes": 52428800,
1319
+ "records": [
1320
+ {
1321
+ "name": "transformer.h.12.mlp.fc2.weight",
1322
+ "shape": [
1323
+ 2560,
1324
+ 10240
1325
+ ],
1326
+ "dtype": "float16",
1327
+ "format": "f32-to-bf16",
1328
+ "nbytes": 52428800,
1329
+ "byteOffset": 0
1330
+ }
1331
+ ],
1332
+ "md5sum": "b457f920d75e816966f9c05eefa2beb9"
1333
+ },
1334
+ {
1335
+ "dataPath": "params_shard_38.bin",
1336
+ "format": "raw-shard",
1337
+ "nbytes": 26327040,
1338
+ "records": [
1339
+ {
1340
+ "name": "transformer.h.10.mixer.out_proj.weight",
1341
+ "shape": [
1342
+ 2560,
1343
+ 2560
1344
+ ],
1345
+ "dtype": "float16",
1346
+ "format": "f32-to-bf16",
1347
+ "nbytes": 13107200,
1348
+ "byteOffset": 0
1349
+ },
1350
+ {
1351
+ "name": "transformer.h.10.mixer.Wqkv.bias",
1352
+ "shape": [
1353
+ 7680
1354
+ ],
1355
+ "dtype": "float16",
1356
+ "format": "f32-to-bf16",
1357
+ "nbytes": 15360,
1358
+ "byteOffset": 13107200
1359
+ },
1360
+ {
1361
+ "name": "transformer.h.11.ln.bias",
1362
+ "shape": [
1363
+ 2560
1364
+ ],
1365
+ "dtype": "float16",
1366
+ "format": "f32-to-bf16",
1367
+ "nbytes": 5120,
1368
+ "byteOffset": 13122560
1369
+ },
1370
+ {
1371
+ "name": "transformer.h.11.ln.weight",
1372
+ "shape": [
1373
+ 2560
1374
+ ],
1375
+ "dtype": "float16",
1376
+ "format": "f32-to-bf16",
1377
+ "nbytes": 5120,
1378
+ "byteOffset": 13127680
1379
+ },
1380
+ {
1381
+ "name": "transformer.h.11.mlp.fc1.bias",
1382
+ "shape": [
1383
+ 10240
1384
+ ],
1385
+ "dtype": "float16",
1386
+ "format": "f32-to-bf16",
1387
+ "nbytes": 20480,
1388
+ "byteOffset": 13132800
1389
+ },
1390
+ {
1391
+ "name": "transformer.h.11.mlp.fc2.bias",
1392
+ "shape": [
1393
+ 2560
1394
+ ],
1395
+ "dtype": "float16",
1396
+ "format": "f32-to-bf16",
1397
+ "nbytes": 5120,
1398
+ "byteOffset": 13153280
1399
+ },
1400
+ {
1401
+ "name": "transformer.h.11.mixer.out_proj.bias",
1402
+ "shape": [
1403
+ 2560
1404
+ ],
1405
+ "dtype": "float16",
1406
+ "format": "f32-to-bf16",
1407
+ "nbytes": 5120,
1408
+ "byteOffset": 13158400
1409
+ },
1410
+ {
1411
+ "name": "transformer.h.11.mixer.out_proj.weight",
1412
+ "shape": [
1413
+ 2560,
1414
+ 2560
1415
+ ],
1416
+ "dtype": "float16",
1417
+ "format": "f32-to-bf16",
1418
+ "nbytes": 13107200,
1419
+ "byteOffset": 13163520
1420
+ },
1421
+ {
1422
+ "name": "transformer.h.11.mixer.Wqkv.bias",
1423
+ "shape": [
1424
+ 7680
1425
+ ],
1426
+ "dtype": "float16",
1427
+ "format": "f32-to-bf16",
1428
+ "nbytes": 15360,
1429
+ "byteOffset": 26270720
1430
+ },
1431
+ {
1432
+ "name": "transformer.h.12.ln.bias",
1433
+ "shape": [
1434
+ 2560
1435
+ ],
1436
+ "dtype": "float16",
1437
+ "format": "f32-to-bf16",
1438
+ "nbytes": 5120,
1439
+ "byteOffset": 26286080
1440
+ },
1441
+ {
1442
+ "name": "transformer.h.12.ln.weight",
1443
+ "shape": [
1444
+ 2560
1445
+ ],
1446
+ "dtype": "float16",
1447
+ "format": "f32-to-bf16",
1448
+ "nbytes": 5120,
1449
+ "byteOffset": 26291200
1450
+ },
1451
+ {
1452
+ "name": "transformer.h.12.mlp.fc1.bias",
1453
+ "shape": [
1454
+ 10240
1455
+ ],
1456
+ "dtype": "float16",
1457
+ "format": "f32-to-bf16",
1458
+ "nbytes": 20480,
1459
+ "byteOffset": 26296320
1460
+ },
1461
+ {
1462
+ "name": "transformer.h.12.mlp.fc2.bias",
1463
+ "shape": [
1464
+ 2560
1465
+ ],
1466
+ "dtype": "float16",
1467
+ "format": "f32-to-bf16",
1468
+ "nbytes": 5120,
1469
+ "byteOffset": 26316800
1470
+ },
1471
+ {
1472
+ "name": "transformer.h.12.mixer.out_proj.bias",
1473
+ "shape": [
1474
+ 2560
1475
+ ],
1476
+ "dtype": "float16",
1477
+ "format": "f32-to-bf16",
1478
+ "nbytes": 5120,
1479
+ "byteOffset": 26321920
1480
+ }
1481
+ ],
1482
+ "md5sum": "f232fe0a5e8b69aaf51f286cf9fd1f43"
1483
+ },
1484
+ {
1485
+ "dataPath": "params_shard_39.bin",
1486
+ "format": "raw-shard",
1487
+ "nbytes": 39321600,
1488
+ "records": [
1489
+ {
1490
+ "name": "transformer.h.12.mixer.Wqkv.weight",
1491
+ "shape": [
1492
+ 7680,
1493
+ 2560
1494
+ ],
1495
+ "dtype": "float16",
1496
+ "format": "f32-to-bf16",
1497
+ "nbytes": 39321600,
1498
+ "byteOffset": 0
1499
+ }
1500
+ ],
1501
+ "md5sum": "5837435a6f54fce26919f36840d2564e"
1502
+ },
1503
+ {
1504
+ "dataPath": "params_shard_40.bin",
1505
+ "format": "raw-shard",
1506
+ "nbytes": 52428800,
1507
+ "records": [
1508
+ {
1509
+ "name": "transformer.h.13.mlp.fc1.weight",
1510
+ "shape": [
1511
+ 10240,
1512
+ 2560
1513
+ ],
1514
+ "dtype": "float16",
1515
+ "format": "f32-to-bf16",
1516
+ "nbytes": 52428800,
1517
+ "byteOffset": 0
1518
+ }
1519
+ ],
1520
+ "md5sum": "6f5c0f70857a2dc5c3a0e3ed0d8593f3"
1521
+ },
1522
+ {
1523
+ "dataPath": "params_shard_41.bin",
1524
+ "format": "raw-shard",
1525
+ "nbytes": 52428800,
1526
+ "records": [
1527
+ {
1528
+ "name": "transformer.h.13.mlp.fc2.weight",
1529
+ "shape": [
1530
+ 2560,
1531
+ 10240
1532
+ ],
1533
+ "dtype": "float16",
1534
+ "format": "f32-to-bf16",
1535
+ "nbytes": 52428800,
1536
+ "byteOffset": 0
1537
+ }
1538
+ ],
1539
+ "md5sum": "3663f5adfd843eac31a20342a6e4f578"
1540
+ },
1541
+ {
1542
+ "dataPath": "params_shard_42.bin",
1543
+ "format": "raw-shard",
1544
+ "nbytes": 39321600,
1545
+ "records": [
1546
+ {
1547
+ "name": "transformer.h.13.mixer.Wqkv.weight",
1548
+ "shape": [
1549
+ 7680,
1550
+ 2560
1551
+ ],
1552
+ "dtype": "float16",
1553
+ "format": "f32-to-bf16",
1554
+ "nbytes": 39321600,
1555
+ "byteOffset": 0
1556
+ }
1557
+ ],
1558
+ "md5sum": "66549463b0931dc08db16a65ebfcef58"
1559
+ },
1560
+ {
1561
+ "dataPath": "params_shard_43.bin",
1562
+ "format": "raw-shard",
1563
+ "nbytes": 52428800,
1564
+ "records": [
1565
+ {
1566
+ "name": "transformer.h.2.mlp.fc1.weight",
1567
+ "shape": [
1568
+ 10240,
1569
+ 2560
1570
+ ],
1571
+ "dtype": "float16",
1572
+ "format": "f32-to-bf16",
1573
+ "nbytes": 52428800,
1574
+ "byteOffset": 0
1575
+ }
1576
+ ],
1577
+ "md5sum": "5dc9124d2ddfca57b608a85e630c1d05"
1578
+ },
1579
+ {
1580
+ "dataPath": "params_shard_44.bin",
1581
+ "format": "raw-shard",
1582
+ "nbytes": 52428800,
1583
+ "records": [
1584
+ {
1585
+ "name": "transformer.h.2.mlp.fc2.weight",
1586
+ "shape": [
1587
+ 2560,
1588
+ 10240
1589
+ ],
1590
+ "dtype": "float16",
1591
+ "format": "f32-to-bf16",
1592
+ "nbytes": 52428800,
1593
+ "byteOffset": 0
1594
+ }
1595
+ ],
1596
+ "md5sum": "508d927411c8164c5c61e311480ad498"
1597
+ },
1598
+ {
1599
+ "dataPath": "params_shard_45.bin",
1600
+ "format": "raw-shard",
1601
+ "nbytes": 26327040,
1602
+ "records": [
1603
+ {
1604
+ "name": "transformer.h.12.mixer.out_proj.weight",
1605
+ "shape": [
1606
+ 2560,
1607
+ 2560
1608
+ ],
1609
+ "dtype": "float16",
1610
+ "format": "f32-to-bf16",
1611
+ "nbytes": 13107200,
1612
+ "byteOffset": 0
1613
+ },
1614
+ {
1615
+ "name": "transformer.h.12.mixer.Wqkv.bias",
1616
+ "shape": [
1617
+ 7680
1618
+ ],
1619
+ "dtype": "float16",
1620
+ "format": "f32-to-bf16",
1621
+ "nbytes": 15360,
1622
+ "byteOffset": 13107200
1623
+ },
1624
+ {
1625
+ "name": "transformer.h.13.ln.bias",
1626
+ "shape": [
1627
+ 2560
1628
+ ],
1629
+ "dtype": "float16",
1630
+ "format": "f32-to-bf16",
1631
+ "nbytes": 5120,
1632
+ "byteOffset": 13122560
1633
+ },
1634
+ {
1635
+ "name": "transformer.h.13.ln.weight",
1636
+ "shape": [
1637
+ 2560
1638
+ ],
1639
+ "dtype": "float16",
1640
+ "format": "f32-to-bf16",
1641
+ "nbytes": 5120,
1642
+ "byteOffset": 13127680
1643
+ },
1644
+ {
1645
+ "name": "transformer.h.13.mlp.fc1.bias",
1646
+ "shape": [
1647
+ 10240
1648
+ ],
1649
+ "dtype": "float16",
1650
+ "format": "f32-to-bf16",
1651
+ "nbytes": 20480,
1652
+ "byteOffset": 13132800
1653
+ },
1654
+ {
1655
+ "name": "transformer.h.13.mlp.fc2.bias",
1656
+ "shape": [
1657
+ 2560
1658
+ ],
1659
+ "dtype": "float16",
1660
+ "format": "f32-to-bf16",
1661
+ "nbytes": 5120,
1662
+ "byteOffset": 13153280
1663
+ },
1664
+ {
1665
+ "name": "transformer.h.13.mixer.out_proj.bias",
1666
+ "shape": [
1667
+ 2560
1668
+ ],
1669
+ "dtype": "float16",
1670
+ "format": "f32-to-bf16",
1671
+ "nbytes": 5120,
1672
+ "byteOffset": 13158400
1673
+ },
1674
+ {
1675
+ "name": "transformer.h.13.mixer.out_proj.weight",
1676
+ "shape": [
1677
+ 2560,
1678
+ 2560
1679
+ ],
1680
+ "dtype": "float16",
1681
+ "format": "f32-to-bf16",
1682
+ "nbytes": 13107200,
1683
+ "byteOffset": 13163520
1684
+ },
1685
+ {
1686
+ "name": "transformer.h.13.mixer.Wqkv.bias",
1687
+ "shape": [
1688
+ 7680
1689
+ ],
1690
+ "dtype": "float16",
1691
+ "format": "f32-to-bf16",
1692
+ "nbytes": 15360,
1693
+ "byteOffset": 26270720
1694
+ },
1695
+ {
1696
+ "name": "transformer.h.2.ln.bias",
1697
+ "shape": [
1698
+ 2560
1699
+ ],
1700
+ "dtype": "float16",
1701
+ "format": "f32-to-bf16",
1702
+ "nbytes": 5120,
1703
+ "byteOffset": 26286080
1704
+ },
1705
+ {
1706
+ "name": "transformer.h.2.ln.weight",
1707
+ "shape": [
1708
+ 2560
1709
+ ],
1710
+ "dtype": "float16",
1711
+ "format": "f32-to-bf16",
1712
+ "nbytes": 5120,
1713
+ "byteOffset": 26291200
1714
+ },
1715
+ {
1716
+ "name": "transformer.h.2.mlp.fc1.bias",
1717
+ "shape": [
1718
+ 10240
1719
+ ],
1720
+ "dtype": "float16",
1721
+ "format": "f32-to-bf16",
1722
+ "nbytes": 20480,
1723
+ "byteOffset": 26296320
1724
+ },
1725
+ {
1726
+ "name": "transformer.h.2.mlp.fc2.bias",
1727
+ "shape": [
1728
+ 2560
1729
+ ],
1730
+ "dtype": "float16",
1731
+ "format": "f32-to-bf16",
1732
+ "nbytes": 5120,
1733
+ "byteOffset": 26316800
1734
+ },
1735
+ {
1736
+ "name": "transformer.h.2.mixer.out_proj.bias",
1737
+ "shape": [
1738
+ 2560
1739
+ ],
1740
+ "dtype": "float16",
1741
+ "format": "f32-to-bf16",
1742
+ "nbytes": 5120,
1743
+ "byteOffset": 26321920
1744
+ }
1745
+ ],
1746
+ "md5sum": "30e056b33c457e301b54e9028e4e3b7c"
1747
+ },
1748
+ {
1749
+ "dataPath": "params_shard_46.bin",
1750
+ "format": "raw-shard",
1751
+ "nbytes": 39321600,
1752
+ "records": [
1753
+ {
1754
+ "name": "transformer.h.2.mixer.Wqkv.weight",
1755
+ "shape": [
1756
+ 7680,
1757
+ 2560
1758
+ ],
1759
+ "dtype": "float16",
1760
+ "format": "f32-to-bf16",
1761
+ "nbytes": 39321600,
1762
+ "byteOffset": 0
1763
+ }
1764
+ ],
1765
+ "md5sum": "02183b8cfd0913fb062d44f6dc7bb435"
1766
+ },
1767
+ {
1768
+ "dataPath": "params_shard_47.bin",
1769
+ "format": "raw-shard",
1770
+ "nbytes": 52428800,
1771
+ "records": [
1772
+ {
1773
+ "name": "transformer.h.3.mlp.fc1.weight",
1774
+ "shape": [
1775
+ 10240,
1776
+ 2560
1777
+ ],
1778
+ "dtype": "float16",
1779
+ "format": "f32-to-bf16",
1780
+ "nbytes": 52428800,
1781
+ "byteOffset": 0
1782
+ }
1783
+ ],
1784
+ "md5sum": "c19fd2b41ca5d52e9b1f56210ba8d3d8"
1785
+ },
1786
+ {
1787
+ "dataPath": "params_shard_48.bin",
1788
+ "format": "raw-shard",
1789
+ "nbytes": 52428800,
1790
+ "records": [
1791
+ {
1792
+ "name": "transformer.h.3.mlp.fc2.weight",
1793
+ "shape": [
1794
+ 2560,
1795
+ 10240
1796
+ ],
1797
+ "dtype": "float16",
1798
+ "format": "f32-to-bf16",
1799
+ "nbytes": 52428800,
1800
+ "byteOffset": 0
1801
+ }
1802
+ ],
1803
+ "md5sum": "aa7ef0fe99061caae303a69488f71e76"
1804
+ },
1805
+ {
1806
+ "dataPath": "params_shard_49.bin",
1807
+ "format": "raw-shard",
1808
+ "nbytes": 39321600,
1809
+ "records": [
1810
+ {
1811
+ "name": "transformer.h.3.mixer.Wqkv.weight",
1812
+ "shape": [
1813
+ 7680,
1814
+ 2560
1815
+ ],
1816
+ "dtype": "float16",
1817
+ "format": "f32-to-bf16",
1818
+ "nbytes": 39321600,
1819
+ "byteOffset": 0
1820
+ }
1821
+ ],
1822
+ "md5sum": "03886d58a911935275e3ade03254936c"
1823
+ },
1824
+ {
1825
+ "dataPath": "params_shard_50.bin",
1826
+ "format": "raw-shard",
1827
+ "nbytes": 52428800,
1828
+ "records": [
1829
+ {
1830
+ "name": "transformer.h.4.mlp.fc1.weight",
1831
+ "shape": [
1832
+ 10240,
1833
+ 2560
1834
+ ],
1835
+ "dtype": "float16",
1836
+ "format": "f32-to-bf16",
1837
+ "nbytes": 52428800,
1838
+ "byteOffset": 0
1839
+ }
1840
+ ],
1841
+ "md5sum": "992123d9456fe73af283b58198c3a8fe"
1842
+ },
1843
+ {
1844
+ "dataPath": "params_shard_51.bin",
1845
+ "format": "raw-shard",
1846
+ "nbytes": 52428800,
1847
+ "records": [
1848
+ {
1849
+ "name": "transformer.h.4.mlp.fc2.weight",
1850
+ "shape": [
1851
+ 2560,
1852
+ 10240
1853
+ ],
1854
+ "dtype": "float16",
1855
+ "format": "f32-to-bf16",
1856
+ "nbytes": 52428800,
1857
+ "byteOffset": 0
1858
+ }
1859
+ ],
1860
+ "md5sum": "2f1ee0816f479dfd634dcfab091e095d"
1861
+ },
1862
+ {
1863
+ "dataPath": "params_shard_52.bin",
1864
+ "format": "raw-shard",
1865
+ "nbytes": 26327040,
1866
+ "records": [
1867
+ {
1868
+ "name": "transformer.h.2.mixer.out_proj.weight",
1869
+ "shape": [
1870
+ 2560,
1871
+ 2560
1872
+ ],
1873
+ "dtype": "float16",
1874
+ "format": "f32-to-bf16",
1875
+ "nbytes": 13107200,
1876
+ "byteOffset": 0
1877
+ },
1878
+ {
1879
+ "name": "transformer.h.2.mixer.Wqkv.bias",
1880
+ "shape": [
1881
+ 7680
1882
+ ],
1883
+ "dtype": "float16",
1884
+ "format": "f32-to-bf16",
1885
+ "nbytes": 15360,
1886
+ "byteOffset": 13107200
1887
+ },
1888
+ {
1889
+ "name": "transformer.h.3.ln.bias",
1890
+ "shape": [
1891
+ 2560
1892
+ ],
1893
+ "dtype": "float16",
1894
+ "format": "f32-to-bf16",
1895
+ "nbytes": 5120,
1896
+ "byteOffset": 13122560
1897
+ },
1898
+ {
1899
+ "name": "transformer.h.3.ln.weight",
1900
+ "shape": [
1901
+ 2560
1902
+ ],
1903
+ "dtype": "float16",
1904
+ "format": "f32-to-bf16",
1905
+ "nbytes": 5120,
1906
+ "byteOffset": 13127680
1907
+ },
1908
+ {
1909
+ "name": "transformer.h.3.mlp.fc1.bias",
1910
+ "shape": [
1911
+ 10240
1912
+ ],
1913
+ "dtype": "float16",
1914
+ "format": "f32-to-bf16",
1915
+ "nbytes": 20480,
1916
+ "byteOffset": 13132800
1917
+ },
1918
+ {
1919
+ "name": "transformer.h.3.mlp.fc2.bias",
1920
+ "shape": [
1921
+ 2560
1922
+ ],
1923
+ "dtype": "float16",
1924
+ "format": "f32-to-bf16",
1925
+ "nbytes": 5120,
1926
+ "byteOffset": 13153280
1927
+ },
1928
+ {
1929
+ "name": "transformer.h.3.mixer.out_proj.bias",
1930
+ "shape": [
1931
+ 2560
1932
+ ],
1933
+ "dtype": "float16",
1934
+ "format": "f32-to-bf16",
1935
+ "nbytes": 5120,
1936
+ "byteOffset": 13158400
1937
+ },
1938
+ {
1939
+ "name": "transformer.h.3.mixer.out_proj.weight",
1940
+ "shape": [
1941
+ 2560,
1942
+ 2560
1943
+ ],
1944
+ "dtype": "float16",
1945
+ "format": "f32-to-bf16",
1946
+ "nbytes": 13107200,
1947
+ "byteOffset": 13163520
1948
+ },
1949
+ {
1950
+ "name": "transformer.h.3.mixer.Wqkv.bias",
1951
+ "shape": [
1952
+ 7680
1953
+ ],
1954
+ "dtype": "float16",
1955
+ "format": "f32-to-bf16",
1956
+ "nbytes": 15360,
1957
+ "byteOffset": 26270720
1958
+ },
1959
+ {
1960
+ "name": "transformer.h.4.ln.bias",
1961
+ "shape": [
1962
+ 2560
1963
+ ],
1964
+ "dtype": "float16",
1965
+ "format": "f32-to-bf16",
1966
+ "nbytes": 5120,
1967
+ "byteOffset": 26286080
1968
+ },
1969
+ {
1970
+ "name": "transformer.h.4.ln.weight",
1971
+ "shape": [
1972
+ 2560
1973
+ ],
1974
+ "dtype": "float16",
1975
+ "format": "f32-to-bf16",
1976
+ "nbytes": 5120,
1977
+ "byteOffset": 26291200
1978
+ },
1979
+ {
1980
+ "name": "transformer.h.4.mlp.fc1.bias",
1981
+ "shape": [
1982
+ 10240
1983
+ ],
1984
+ "dtype": "float16",
1985
+ "format": "f32-to-bf16",
1986
+ "nbytes": 20480,
1987
+ "byteOffset": 26296320
1988
+ },
1989
+ {
1990
+ "name": "transformer.h.4.mlp.fc2.bias",
1991
+ "shape": [
1992
+ 2560
1993
+ ],
1994
+ "dtype": "float16",
1995
+ "format": "f32-to-bf16",
1996
+ "nbytes": 5120,
1997
+ "byteOffset": 26316800
1998
+ },
1999
+ {
2000
+ "name": "transformer.h.4.mixer.out_proj.bias",
2001
+ "shape": [
2002
+ 2560
2003
+ ],
2004
+ "dtype": "float16",
2005
+ "format": "f32-to-bf16",
2006
+ "nbytes": 5120,
2007
+ "byteOffset": 26321920
2008
+ }
2009
+ ],
2010
+ "md5sum": "aca564b8e952b26db4b32d881c11d348"
2011
+ },
2012
+ {
2013
+ "dataPath": "params_shard_53.bin",
2014
+ "format": "raw-shard",
2015
+ "nbytes": 39321600,
2016
+ "records": [
2017
+ {
2018
+ "name": "transformer.h.4.mixer.Wqkv.weight",
2019
+ "shape": [
2020
+ 7680,
2021
+ 2560
2022
+ ],
2023
+ "dtype": "float16",
2024
+ "format": "f32-to-bf16",
2025
+ "nbytes": 39321600,
2026
+ "byteOffset": 0
2027
+ }
2028
+ ],
2029
+ "md5sum": "a9530456770c3619145219284cdb6a82"
2030
+ },
2031
+ {
2032
+ "dataPath": "params_shard_54.bin",
2033
+ "format": "raw-shard",
2034
+ "nbytes": 52428800,
2035
+ "records": [
2036
+ {
2037
+ "name": "transformer.h.5.mlp.fc1.weight",
2038
+ "shape": [
2039
+ 10240,
2040
+ 2560
2041
+ ],
2042
+ "dtype": "float16",
2043
+ "format": "f32-to-bf16",
2044
+ "nbytes": 52428800,
2045
+ "byteOffset": 0
2046
+ }
2047
+ ],
2048
+ "md5sum": "beb5db238e6ed35d95b08a85bd9e01e9"
2049
+ },
2050
+ {
2051
+ "dataPath": "params_shard_55.bin",
2052
+ "format": "raw-shard",
2053
+ "nbytes": 52428800,
2054
+ "records": [
2055
+ {
2056
+ "name": "transformer.h.5.mlp.fc2.weight",
2057
+ "shape": [
2058
+ 2560,
2059
+ 10240
2060
+ ],
2061
+ "dtype": "float16",
2062
+ "format": "f32-to-bf16",
2063
+ "nbytes": 52428800,
2064
+ "byteOffset": 0
2065
+ }
2066
+ ],
2067
+ "md5sum": "00a4fa5a70deb767650b614fb10c8036"
2068
+ },
2069
+ {
2070
+ "dataPath": "params_shard_56.bin",
2071
+ "format": "raw-shard",
2072
+ "nbytes": 39321600,
2073
+ "records": [
2074
+ {
2075
+ "name": "transformer.h.5.mixer.Wqkv.weight",
2076
+ "shape": [
2077
+ 7680,
2078
+ 2560
2079
+ ],
2080
+ "dtype": "float16",
2081
+ "format": "f32-to-bf16",
2082
+ "nbytes": 39321600,
2083
+ "byteOffset": 0
2084
+ }
2085
+ ],
2086
+ "md5sum": "2ad9d1721f84c5f31dff3d7d7a7a96c7"
2087
+ },
2088
+ {
2089
+ "dataPath": "params_shard_57.bin",
2090
+ "format": "raw-shard",
2091
+ "nbytes": 52428800,
2092
+ "records": [
2093
+ {
2094
+ "name": "transformer.h.6.mlp.fc1.weight",
2095
+ "shape": [
2096
+ 10240,
2097
+ 2560
2098
+ ],
2099
+ "dtype": "float16",
2100
+ "format": "f32-to-bf16",
2101
+ "nbytes": 52428800,
2102
+ "byteOffset": 0
2103
+ }
2104
+ ],
2105
+ "md5sum": "09e347d5895f73bc819e435b579362b4"
2106
+ },
2107
+ {
2108
+ "dataPath": "params_shard_58.bin",
2109
+ "format": "raw-shard",
2110
+ "nbytes": 52428800,
2111
+ "records": [
2112
+ {
2113
+ "name": "transformer.h.6.mlp.fc2.weight",
2114
+ "shape": [
2115
+ 2560,
2116
+ 10240
2117
+ ],
2118
+ "dtype": "float16",
2119
+ "format": "f32-to-bf16",
2120
+ "nbytes": 52428800,
2121
+ "byteOffset": 0
2122
+ }
2123
+ ],
2124
+ "md5sum": "b5ab7ebe94255ed18aa03fcae28f918d"
2125
+ },
2126
+ {
2127
+ "dataPath": "params_shard_59.bin",
2128
+ "format": "raw-shard",
2129
+ "nbytes": 26327040,
2130
+ "records": [
2131
+ {
2132
+ "name": "transformer.h.4.mixer.out_proj.weight",
2133
+ "shape": [
2134
+ 2560,
2135
+ 2560
2136
+ ],
2137
+ "dtype": "float16",
2138
+ "format": "f32-to-bf16",
2139
+ "nbytes": 13107200,
2140
+ "byteOffset": 0
2141
+ },
2142
+ {
2143
+ "name": "transformer.h.4.mixer.Wqkv.bias",
2144
+ "shape": [
2145
+ 7680
2146
+ ],
2147
+ "dtype": "float16",
2148
+ "format": "f32-to-bf16",
2149
+ "nbytes": 15360,
2150
+ "byteOffset": 13107200
2151
+ },
2152
+ {
2153
+ "name": "transformer.h.5.ln.bias",
2154
+ "shape": [
2155
+ 2560
2156
+ ],
2157
+ "dtype": "float16",
2158
+ "format": "f32-to-bf16",
2159
+ "nbytes": 5120,
2160
+ "byteOffset": 13122560
2161
+ },
2162
+ {
2163
+ "name": "transformer.h.5.ln.weight",
2164
+ "shape": [
2165
+ 2560
2166
+ ],
2167
+ "dtype": "float16",
2168
+ "format": "f32-to-bf16",
2169
+ "nbytes": 5120,
2170
+ "byteOffset": 13127680
2171
+ },
2172
+ {
2173
+ "name": "transformer.h.5.mlp.fc1.bias",
2174
+ "shape": [
2175
+ 10240
2176
+ ],
2177
+ "dtype": "float16",
2178
+ "format": "f32-to-bf16",
2179
+ "nbytes": 20480,
2180
+ "byteOffset": 13132800
2181
+ },
2182
+ {
2183
+ "name": "transformer.h.5.mlp.fc2.bias",
2184
+ "shape": [
2185
+ 2560
2186
+ ],
2187
+ "dtype": "float16",
2188
+ "format": "f32-to-bf16",
2189
+ "nbytes": 5120,
2190
+ "byteOffset": 13153280
2191
+ },
2192
+ {
2193
+ "name": "transformer.h.5.mixer.out_proj.bias",
2194
+ "shape": [
2195
+ 2560
2196
+ ],
2197
+ "dtype": "float16",
2198
+ "format": "f32-to-bf16",
2199
+ "nbytes": 5120,
2200
+ "byteOffset": 13158400
2201
+ },
2202
+ {
2203
+ "name": "transformer.h.5.mixer.out_proj.weight",
2204
+ "shape": [
2205
+ 2560,
2206
+ 2560
2207
+ ],
2208
+ "dtype": "float16",
2209
+ "format": "f32-to-bf16",
2210
+ "nbytes": 13107200,
2211
+ "byteOffset": 13163520
2212
+ },
2213
+ {
2214
+ "name": "transformer.h.5.mixer.Wqkv.bias",
2215
+ "shape": [
2216
+ 7680
2217
+ ],
2218
+ "dtype": "float16",
2219
+ "format": "f32-to-bf16",
2220
+ "nbytes": 15360,
2221
+ "byteOffset": 26270720
2222
+ },
2223
+ {
2224
+ "name": "transformer.h.6.ln.bias",
2225
+ "shape": [
2226
+ 2560
2227
+ ],
2228
+ "dtype": "float16",
2229
+ "format": "f32-to-bf16",
2230
+ "nbytes": 5120,
2231
+ "byteOffset": 26286080
2232
+ },
2233
+ {
2234
+ "name": "transformer.h.6.ln.weight",
2235
+ "shape": [
2236
+ 2560
2237
+ ],
2238
+ "dtype": "float16",
2239
+ "format": "f32-to-bf16",
2240
+ "nbytes": 5120,
2241
+ "byteOffset": 26291200
2242
+ },
2243
+ {
2244
+ "name": "transformer.h.6.mlp.fc1.bias",
2245
+ "shape": [
2246
+ 10240
2247
+ ],
2248
+ "dtype": "float16",
2249
+ "format": "f32-to-bf16",
2250
+ "nbytes": 20480,
2251
+ "byteOffset": 26296320
2252
+ },
2253
+ {
2254
+ "name": "transformer.h.6.mlp.fc2.bias",
2255
+ "shape": [
2256
+ 2560
2257
+ ],
2258
+ "dtype": "float16",
2259
+ "format": "f32-to-bf16",
2260
+ "nbytes": 5120,
2261
+ "byteOffset": 26316800
2262
+ },
2263
+ {
2264
+ "name": "transformer.h.6.mixer.out_proj.bias",
2265
+ "shape": [
2266
+ 2560
2267
+ ],
2268
+ "dtype": "float16",
2269
+ "format": "f32-to-bf16",
2270
+ "nbytes": 5120,
2271
+ "byteOffset": 26321920
2272
+ }
2273
+ ],
2274
+ "md5sum": "4ff6ed9e8c6244542ed1e0412969f707"
2275
+ },
2276
+ {
2277
+ "dataPath": "params_shard_60.bin",
2278
+ "format": "raw-shard",
2279
+ "nbytes": 39321600,
2280
+ "records": [
2281
+ {
2282
+ "name": "transformer.h.6.mixer.Wqkv.weight",
2283
+ "shape": [
2284
+ 7680,
2285
+ 2560
2286
+ ],
2287
+ "dtype": "float16",
2288
+ "format": "f32-to-bf16",
2289
+ "nbytes": 39321600,
2290
+ "byteOffset": 0
2291
+ }
2292
+ ],
2293
+ "md5sum": "aeceb9bec9d4f99791bab976eda24623"
2294
+ },
2295
+ {
2296
+ "dataPath": "params_shard_61.bin",
2297
+ "format": "raw-shard",
2298
+ "nbytes": 52428800,
2299
+ "records": [
2300
+ {
2301
+ "name": "transformer.h.7.mlp.fc1.weight",
2302
+ "shape": [
2303
+ 10240,
2304
+ 2560
2305
+ ],
2306
+ "dtype": "float16",
2307
+ "format": "f32-to-bf16",
2308
+ "nbytes": 52428800,
2309
+ "byteOffset": 0
2310
+ }
2311
+ ],
2312
+ "md5sum": "e28a069a898f94610d4f52f03a70a132"
2313
+ },
2314
+ {
2315
+ "dataPath": "params_shard_62.bin",
2316
+ "format": "raw-shard",
2317
+ "nbytes": 52428800,
2318
+ "records": [
2319
+ {
2320
+ "name": "transformer.h.7.mlp.fc2.weight",
2321
+ "shape": [
2322
+ 2560,
2323
+ 10240
2324
+ ],
2325
+ "dtype": "float16",
2326
+ "format": "f32-to-bf16",
2327
+ "nbytes": 52428800,
2328
+ "byteOffset": 0
2329
+ }
2330
+ ],
2331
+ "md5sum": "eaeef94a51cf1d83f8d5132defd7af77"
2332
+ },
2333
+ {
2334
+ "dataPath": "params_shard_63.bin",
2335
+ "format": "raw-shard",
2336
+ "nbytes": 39321600,
2337
+ "records": [
2338
+ {
2339
+ "name": "transformer.h.7.mixer.Wqkv.weight",
2340
+ "shape": [
2341
+ 7680,
2342
+ 2560
2343
+ ],
2344
+ "dtype": "float16",
2345
+ "format": "f32-to-bf16",
2346
+ "nbytes": 39321600,
2347
+ "byteOffset": 0
2348
+ }
2349
+ ],
2350
+ "md5sum": "b11cc2d4cb9dfdf49fc21a6d5ebfeaef"
2351
+ },
2352
+ {
2353
+ "dataPath": "params_shard_64.bin",
2354
+ "format": "raw-shard",
2355
+ "nbytes": 52428800,
2356
+ "records": [
2357
+ {
2358
+ "name": "transformer.h.8.mlp.fc1.weight",
2359
+ "shape": [
2360
+ 10240,
2361
+ 2560
2362
+ ],
2363
+ "dtype": "float16",
2364
+ "format": "f32-to-bf16",
2365
+ "nbytes": 52428800,
2366
+ "byteOffset": 0
2367
+ }
2368
+ ],
2369
+ "md5sum": "a511be3be0bf4534f77641d65119eafa"
2370
+ },
2371
+ {
2372
+ "dataPath": "params_shard_65.bin",
2373
+ "format": "raw-shard",
2374
+ "nbytes": 52428800,
2375
+ "records": [
2376
+ {
2377
+ "name": "transformer.h.8.mlp.fc2.weight",
2378
+ "shape": [
2379
+ 2560,
2380
+ 10240
2381
+ ],
2382
+ "dtype": "float16",
2383
+ "format": "f32-to-bf16",
2384
+ "nbytes": 52428800,
2385
+ "byteOffset": 0
2386
+ }
2387
+ ],
2388
+ "md5sum": "44343e3e55fe12e42a0fe3ce85be4edd"
2389
+ },
2390
+ {
2391
+ "dataPath": "params_shard_66.bin",
2392
+ "format": "raw-shard",
2393
+ "nbytes": 26327040,
2394
+ "records": [
2395
+ {
2396
+ "name": "transformer.h.6.mixer.out_proj.weight",
2397
+ "shape": [
2398
+ 2560,
2399
+ 2560
2400
+ ],
2401
+ "dtype": "float16",
2402
+ "format": "f32-to-bf16",
2403
+ "nbytes": 13107200,
2404
+ "byteOffset": 0
2405
+ },
2406
+ {
2407
+ "name": "transformer.h.6.mixer.Wqkv.bias",
2408
+ "shape": [
2409
+ 7680
2410
+ ],
2411
+ "dtype": "float16",
2412
+ "format": "f32-to-bf16",
2413
+ "nbytes": 15360,
2414
+ "byteOffset": 13107200
2415
+ },
2416
+ {
2417
+ "name": "transformer.h.7.ln.bias",
2418
+ "shape": [
2419
+ 2560
2420
+ ],
2421
+ "dtype": "float16",
2422
+ "format": "f32-to-bf16",
2423
+ "nbytes": 5120,
2424
+ "byteOffset": 13122560
2425
+ },
2426
+ {
2427
+ "name": "transformer.h.7.ln.weight",
2428
+ "shape": [
2429
+ 2560
2430
+ ],
2431
+ "dtype": "float16",
2432
+ "format": "f32-to-bf16",
2433
+ "nbytes": 5120,
2434
+ "byteOffset": 13127680
2435
+ },
2436
+ {
2437
+ "name": "transformer.h.7.mlp.fc1.bias",
2438
+ "shape": [
2439
+ 10240
2440
+ ],
2441
+ "dtype": "float16",
2442
+ "format": "f32-to-bf16",
2443
+ "nbytes": 20480,
2444
+ "byteOffset": 13132800
2445
+ },
2446
+ {
2447
+ "name": "transformer.h.7.mlp.fc2.bias",
2448
+ "shape": [
2449
+ 2560
2450
+ ],
2451
+ "dtype": "float16",
2452
+ "format": "f32-to-bf16",
2453
+ "nbytes": 5120,
2454
+ "byteOffset": 13153280
2455
+ },
2456
+ {
2457
+ "name": "transformer.h.7.mixer.out_proj.bias",
2458
+ "shape": [
2459
+ 2560
2460
+ ],
2461
+ "dtype": "float16",
2462
+ "format": "f32-to-bf16",
2463
+ "nbytes": 5120,
2464
+ "byteOffset": 13158400
2465
+ },
2466
+ {
2467
+ "name": "transformer.h.7.mixer.out_proj.weight",
2468
+ "shape": [
2469
+ 2560,
2470
+ 2560
2471
+ ],
2472
+ "dtype": "float16",
2473
+ "format": "f32-to-bf16",
2474
+ "nbytes": 13107200,
2475
+ "byteOffset": 13163520
2476
+ },
2477
+ {
2478
+ "name": "transformer.h.7.mixer.Wqkv.bias",
2479
+ "shape": [
2480
+ 7680
2481
+ ],
2482
+ "dtype": "float16",
2483
+ "format": "f32-to-bf16",
2484
+ "nbytes": 15360,
2485
+ "byteOffset": 26270720
2486
+ },
2487
+ {
2488
+ "name": "transformer.h.8.ln.bias",
2489
+ "shape": [
2490
+ 2560
2491
+ ],
2492
+ "dtype": "float16",
2493
+ "format": "f32-to-bf16",
2494
+ "nbytes": 5120,
2495
+ "byteOffset": 26286080
2496
+ },
2497
+ {
2498
+ "name": "transformer.h.8.ln.weight",
2499
+ "shape": [
2500
+ 2560
2501
+ ],
2502
+ "dtype": "float16",
2503
+ "format": "f32-to-bf16",
2504
+ "nbytes": 5120,
2505
+ "byteOffset": 26291200
2506
+ },
2507
+ {
2508
+ "name": "transformer.h.8.mlp.fc1.bias",
2509
+ "shape": [
2510
+ 10240
2511
+ ],
2512
+ "dtype": "float16",
2513
+ "format": "f32-to-bf16",
2514
+ "nbytes": 20480,
2515
+ "byteOffset": 26296320
2516
+ },
2517
+ {
2518
+ "name": "transformer.h.8.mlp.fc2.bias",
2519
+ "shape": [
2520
+ 2560
2521
+ ],
2522
+ "dtype": "float16",
2523
+ "format": "f32-to-bf16",
2524
+ "nbytes": 5120,
2525
+ "byteOffset": 26316800
2526
+ },
2527
+ {
2528
+ "name": "transformer.h.8.mixer.out_proj.bias",
2529
+ "shape": [
2530
+ 2560
2531
+ ],
2532
+ "dtype": "float16",
2533
+ "format": "f32-to-bf16",
2534
+ "nbytes": 5120,
2535
+ "byteOffset": 26321920
2536
+ }
2537
+ ],
2538
+ "md5sum": "00162d75939eb0330d894bb26f8c5685"
2539
+ },
2540
+ {
2541
+ "dataPath": "params_shard_67.bin",
2542
+ "format": "raw-shard",
2543
+ "nbytes": 39321600,
2544
+ "records": [
2545
+ {
2546
+ "name": "transformer.h.8.mixer.Wqkv.weight",
2547
+ "shape": [
2548
+ 7680,
2549
+ 2560
2550
+ ],
2551
+ "dtype": "float16",
2552
+ "format": "f32-to-bf16",
2553
+ "nbytes": 39321600,
2554
+ "byteOffset": 0
2555
+ }
2556
+ ],
2557
+ "md5sum": "a8da9fa04ff9fbb632e404e55eb99acc"
2558
+ },
2559
+ {
2560
+ "dataPath": "params_shard_68.bin",
2561
+ "format": "raw-shard",
2562
+ "nbytes": 52428800,
2563
+ "records": [
2564
+ {
2565
+ "name": "transformer.h.9.mlp.fc1.weight",
2566
+ "shape": [
2567
+ 10240,
2568
+ 2560
2569
+ ],
2570
+ "dtype": "float16",
2571
+ "format": "f32-to-bf16",
2572
+ "nbytes": 52428800,
2573
+ "byteOffset": 0
2574
+ }
2575
+ ],
2576
+ "md5sum": "12e1356b7f7960e0b2005397eff796cd"
2577
+ },
2578
+ {
2579
+ "dataPath": "params_shard_69.bin",
2580
+ "format": "raw-shard",
2581
+ "nbytes": 52428800,
2582
+ "records": [
2583
+ {
2584
+ "name": "transformer.h.9.mlp.fc2.weight",
2585
+ "shape": [
2586
+ 2560,
2587
+ 10240
2588
+ ],
2589
+ "dtype": "float16",
2590
+ "format": "f32-to-bf16",
2591
+ "nbytes": 52428800,
2592
+ "byteOffset": 0
2593
+ }
2594
+ ],
2595
+ "md5sum": "dd5efaa435d1a33f296c59529b249f7c"
2596
+ },
2597
+ {
2598
+ "dataPath": "params_shard_70.bin",
2599
+ "format": "raw-shard",
2600
+ "nbytes": 39321600,
2601
+ "records": [
2602
+ {
2603
+ "name": "transformer.h.9.mixer.Wqkv.weight",
2604
+ "shape": [
2605
+ 7680,
2606
+ 2560
2607
+ ],
2608
+ "dtype": "float16",
2609
+ "format": "f32-to-bf16",
2610
+ "nbytes": 39321600,
2611
+ "byteOffset": 0
2612
+ }
2613
+ ],
2614
+ "md5sum": "4eecc2ff6bbef80cc85ff61963f44e9d"
2615
+ },
2616
+ {
2617
+ "dataPath": "params_shard_71.bin",
2618
+ "format": "raw-shard",
2619
+ "nbytes": 26286080,
2620
+ "records": [
2621
+ {
2622
+ "name": "transformer.h.8.mixer.out_proj.weight",
2623
+ "shape": [
2624
+ 2560,
2625
+ 2560
2626
+ ],
2627
+ "dtype": "float16",
2628
+ "format": "f32-to-bf16",
2629
+ "nbytes": 13107200,
2630
+ "byteOffset": 0
2631
+ },
2632
+ {
2633
+ "name": "transformer.h.8.mixer.Wqkv.bias",
2634
+ "shape": [
2635
+ 7680
2636
+ ],
2637
+ "dtype": "float16",
2638
+ "format": "f32-to-bf16",
2639
+ "nbytes": 15360,
2640
+ "byteOffset": 13107200
2641
+ },
2642
+ {
2643
+ "name": "transformer.h.9.ln.bias",
2644
+ "shape": [
2645
+ 2560
2646
+ ],
2647
+ "dtype": "float16",
2648
+ "format": "f32-to-bf16",
2649
+ "nbytes": 5120,
2650
+ "byteOffset": 13122560
2651
+ },
2652
+ {
2653
+ "name": "transformer.h.9.ln.weight",
2654
+ "shape": [
2655
+ 2560
2656
+ ],
2657
+ "dtype": "float16",
2658
+ "format": "f32-to-bf16",
2659
+ "nbytes": 5120,
2660
+ "byteOffset": 13127680
2661
+ },
2662
+ {
2663
+ "name": "transformer.h.9.mlp.fc1.bias",
2664
+ "shape": [
2665
+ 10240
2666
+ ],
2667
+ "dtype": "float16",
2668
+ "format": "f32-to-bf16",
2669
+ "nbytes": 20480,
2670
+ "byteOffset": 13132800
2671
+ },
2672
+ {
2673
+ "name": "transformer.h.9.mlp.fc2.bias",
2674
+ "shape": [
2675
+ 2560
2676
+ ],
2677
+ "dtype": "float16",
2678
+ "format": "f32-to-bf16",
2679
+ "nbytes": 5120,
2680
+ "byteOffset": 13153280
2681
+ },
2682
+ {
2683
+ "name": "transformer.h.9.mixer.out_proj.bias",
2684
+ "shape": [
2685
+ 2560
2686
+ ],
2687
+ "dtype": "float16",
2688
+ "format": "f32-to-bf16",
2689
+ "nbytes": 5120,
2690
+ "byteOffset": 13158400
2691
+ },
2692
+ {
2693
+ "name": "transformer.h.9.mixer.out_proj.weight",
2694
+ "shape": [
2695
+ 2560,
2696
+ 2560
2697
+ ],
2698
+ "dtype": "float16",
2699
+ "format": "f32-to-bf16",
2700
+ "nbytes": 13107200,
2701
+ "byteOffset": 13163520
2702
+ },
2703
+ {
2704
+ "name": "transformer.h.9.mixer.Wqkv.bias",
2705
+ "shape": [
2706
+ 7680
2707
+ ],
2708
+ "dtype": "float16",
2709
+ "format": "f32-to-bf16",
2710
+ "nbytes": 15360,
2711
+ "byteOffset": 26270720
2712
+ }
2713
+ ],
2714
+ "md5sum": "7768f3a73f324517de18e1b8b7417da3"
2715
+ }
2716
+ ]
2717
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8830c58c1b6c6ba321cab6145763ec5770d3d7f7bff3f6cd1bfdc5b912c8c17
3
+ size 262144000
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5797822d42c5eaa63f8f12cf564195a2564bd60d02e8baf8b58baa0f1152e059
3
+ size 52428800
params_shard_10.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1792964bf1b53d9fa62d425cc5780454a35b911d24da03e1521d7f2eb3bb581e
3
+ size 39321600
params_shard_11.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a6cf0768f6f99f97590ae516aad5b83bf0220ddd9235d598a8b12be4a8c9c60
3
+ size 52428800
params_shard_12.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:30ce98981e19f20d288665f5a13b9344bf31964e903ea31b61b4a4cc3124d03a
3
+ size 52428800
params_shard_13.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8439ad1bfddf8c2584311a9ab51dad1b4df46d45de56efb7c00c722dc2f2e15
3
+ size 39321600
params_shard_14.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48a4c673f0698c4c890e51e8b42a56ff6a125c9ae78bae051f0e20d3a3aba41d
3
+ size 52428800
params_shard_15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:577640665b2fb458e88fcfdfef2920641f4ac6a99f50a7bbc17d864e4fda26bc
3
+ size 52428800
params_shard_16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f85d2be49f453f3f8ce98b42748266470bb1a4fb36017558e0fcbeddc5f9a603
3
+ size 26327040
params_shard_17.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e08106383d4ee773fd0624208a985aed52778c213d11d6f5b75bafd20a9b2220
3
+ size 39321600
params_shard_18.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:051d2ccf157dd13b97462776c67603739a434df017cf0fe2fdb251b0e9816107
3
+ size 52428800
params_shard_19.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ca84986a0af21e0bbab301c10d7e31b8b8a1ea7e560ac57e0ba78a072e71194
3
+ size 52428800
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e83e9e878272f2041790a5baa7a925dd94609b2b2940901ae6dd2c7b33327740
3
+ size 52428800
params_shard_20.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44bf6e6e353bff7979471553050d6248466153f0cea4305fab5a73537e9281a5
3
+ size 39321600
params_shard_21.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfede3dfb22f419cba734087d4072cb1e63eedb97fd979d8eb051d3444979917
3
+ size 262144000
params_shard_22.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a378b6e1de0e85563eadd7e33812ac7a16f0432bc50ada982b35368fc9232d01
3
+ size 52428800
params_shard_23.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7dd5000049441cbf1b8c6540f1868313952070bad6507fcf671709efbe2af86
3
+ size 52428800
params_shard_24.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f793bf4193c5beec0dcfe558de27f97acc625214fcfd6e79464b1ba7b74c2e7
3
+ size 26327040
params_shard_25.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a295a4c0543dda8ec711f7d13ea374a09b28d27fa45bdaab4bbce875a6a91b5
3
+ size 39321600
params_shard_26.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:046611dea9ecd4b90f4d6b3683907f03aeed310465bda3b80b244ad9be078abf
3
+ size 52428800
params_shard_27.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7a30c256412edf3ad41965d4e93ff4ae407c17fc9d600f3c0107a473dba3425
3
+ size 52428800
params_shard_28.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65a1064d4dec434b3bca15eaee5effba6299e0d1f4f3543c74941bb94004f1be
3
+ size 39321600
params_shard_29.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87a4256e7b5fa9a66f3d5b2406602a75298fd9bca53c6ff0f3ec3a903f86babc
3
+ size 52428800
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da08adb882824c1661fab21c9c91f306a0ce07ffdd0247e3674ad75bbf0caf5d
3
+ size 39321600
params_shard_30.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae5c9351d4e5fa0db74e54927d747cc20346f439bb8284980a4d225419d34688
3
+ size 52428800
params_shard_31.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e26cec49ddf81796d7e7cb83ed1bb4882a34e8f5212c62b1c30ff28b0c4f341
3
+ size 26327040
params_shard_32.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:844ba67732397a65e190b713a16b2f9baeb12f606109f761227ffe7d8d07c632
3
+ size 39321600
params_shard_33.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:906c45fc771287012b81ba51c32050dbb21d55c1df8fd175ab095898e4127309
3
+ size 52428800
params_shard_34.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af7c704ba13d9d49f85137aeb5c0bdf075cd1da97b61c22f6bc052d8010b6e9e
3
+ size 52428800
params_shard_35.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6869e27a0862120c1aadc089e118c2e31a3b33486a6063939823ac05120d707
3
+ size 39321600
params_shard_36.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd5445d28f8c06d3a8ccbf1213572bd5bf087b206df7bd8f0d22296badf7948f
3
+ size 52428800
params_shard_37.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6e039a0b5e150963f131df8ee41bd0f1674ff69551b2860fd35584755d54f086
3
+ size 52428800
params_shard_38.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a1339bafea307a72848cd097550e1fb07ccb49e960baa73a32cf607ffc7eb7a
3
+ size 26327040
params_shard_39.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ffc3cace534009a19c55c00d1b884d24526ae4d778af281b9bfa640b80401ab
3
+ size 39321600
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e0f219e4b2a00824063ff1d5a9048b4918c9a95683a22933f79ad1347e90f08e
3
+ size 52428800
params_shard_40.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac54ce6612e225802bd185cc2118228fc643f068ab0652a21f231f031a4eddfa
3
+ size 52428800
params_shard_41.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f7443a70241dd0d3d1975358538d05f1c9dd58dee49c6759822e634c830cb17
3
+ size 52428800
params_shard_42.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5b47add6bbfe056d593d24912bc45988b6af2a0e3ef90816cfde3e35619455a6
3
+ size 39321600
params_shard_43.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c40b565044bf48898f672aba36cd2ac5b30fb13d8c7ecdeebee6f80ac1ae86b6
3
+ size 52428800
params_shard_44.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:624681d060d7d8e6cf0ca7f502c69585dc393d6a381f5f6e3ad6bf6353b977b5
3
+ size 52428800
params_shard_45.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ad127c5ac7651e0405a8b2e0e4c9927d592fc898784920f54a088ea4649a3b9
3
+ size 26327040
params_shard_46.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e92ac55ad4a438e18e089f405a43015f1a2660adefa13d39d581a193a3bb7c9b
3
+ size 39321600
params_shard_47.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6afa7a343de99d1cee27c5568dd8651fc35a509edb3037754a4f33035f63a70b
3
+ size 52428800
params_shard_48.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a70860c4a9c1b670c2d351aee6f60a2e3158a27d68cd13f35de7a9cea3c23f3
3
+ size 52428800
params_shard_49.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe70de35f014b5d3602cfdd3c8b3495e9048c3c56b51e3fa55b7562bee0835a9
3
+ size 39321600
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8dda8d1b3d12b44a84ae0a23b61c7a5a4a54f6443184ee4487878f50b37f9ac6
3
+ size 52428800