TobDeBer commited on
Commit
bd7bbd3
1 Parent(s): 3e3648a

descriptions

Browse files
Files changed (3) hide show
  1. zephyr_f32_int2.txt +650 -0
  2. zephyr_f32_int3.txt +326 -0
  3. zephyr_int8_int3.txt +362 -0
zephyr_f32_int2.txt ADDED
@@ -0,0 +1,650 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ (dream) tb@IBM-PF38WZKF:~/funstreams/AI$ ./llama.cpp/quantize zephyr_f32.gguf Q3_K
2
+ main: build = 1798 (128de35)
3
+ main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
4
+ main: quantizing 'zephyr_f32.gguf' to 'ggml-model-Q3_K.gguf' as Q3_K
5
+ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from zephyr_f32.gguf (version GGUF V3 (latest))
6
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
7
+ llama_model_loader: - kv 0: general.architecture str = llama
8
+ llama_model_loader: - kv 1: general.name str = .
9
+ llama_model_loader: - kv 2: llama.context_length u32 = 32768
10
+ llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
11
+ llama_model_loader: - kv 4: llama.block_count u32 = 32
12
+ llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
13
+ llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
14
+ llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
15
+ llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
16
+ llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
17
+ llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
18
+ llama_model_loader: - kv 11: general.file_type u32 = 0
19
+ llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
20
+ llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
21
+ llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
22
+ llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
23
+ llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,58980] = ["▁ t", "i n", "e r", "▁ a", "h e...
24
+ llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
25
+ llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
26
+ llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
27
+ llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
28
+ llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
29
+ llama_model_loader: - type f32: 291 tensors
30
+ llama_model_quantize_internal: meta size = 1671648 bytes
31
+ [ 1/ 291] token_embd.weight - [ 4096, 32000, 1, 1], type = f32, quantizing to q3_K .. size = 500.00 MiB -> 53.71 MiB | hist:
32
+ [ 2/ 291] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
33
+ [ 3/ 291] blk.0.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q5_K .. size = 224.00 MiB -> 38.50 MiB | hist:
34
+ [ 4/ 291] blk.0.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
35
+ [ 5/ 291] blk.0.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
36
+ [ 6/ 291] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
37
+ [ 7/ 291] blk.0.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
38
+ [ 8/ 291] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
39
+ [ 9/ 291] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
40
+ [ 10/ 291] blk.0.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q5_K .. size = 16.00 MiB -> 2.75 MiB | hist:
41
+ [ 11/ 291] blk.1.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
42
+ [ 12/ 291] blk.1.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q5_K .. size = 224.00 MiB -> 38.50 MiB | hist:
43
+ [ 13/ 291] blk.1.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
44
+ [ 14/ 291] blk.1.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
45
+ [ 15/ 291] blk.1.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
46
+ [ 16/ 291] blk.1.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
47
+ [ 17/ 291] blk.1.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
48
+ [ 18/ 291] blk.1.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
49
+ [ 19/ 291] blk.1.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q5_K .. size = 16.00 MiB -> 2.75 MiB | hist:
50
+ [ 20/ 291] blk.2.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
51
+ [ 21/ 291] blk.2.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
52
+ [ 22/ 291] blk.2.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
53
+ [ 23/ 291] blk.2.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
54
+ [ 24/ 291] blk.2.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
55
+ [ 25/ 291] blk.2.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
56
+ [ 26/ 291] blk.2.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
57
+ [ 27/ 291] blk.2.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
58
+ [ 28/ 291] blk.2.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
59
+ [ 29/ 291] blk.3.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
60
+ [ 30/ 291] blk.3.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
61
+ [ 31/ 291] blk.3.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
62
+ [ 32/ 291] blk.3.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
63
+ [ 33/ 291] blk.3.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
64
+ [ 34/ 291] blk.3.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
65
+ [ 35/ 291] blk.3.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
66
+ [ 36/ 291] blk.3.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
67
+ [ 37/ 291] blk.3.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
68
+ [ 38/ 291] blk.4.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
69
+ [ 39/ 291] blk.4.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
70
+ [ 40/ 291] blk.4.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
71
+ [ 41/ 291] blk.4.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
72
+ [ 42/ 291] blk.4.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
73
+ [ 43/ 291] blk.4.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
74
+ [ 44/ 291] blk.4.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
75
+ [ 45/ 291] blk.4.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
76
+ [ 46/ 291] blk.4.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
77
+ [ 47/ 291] blk.5.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
78
+ [ 48/ 291] blk.5.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
79
+ [ 49/ 291] blk.5.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
80
+ [ 50/ 291] blk.5.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
81
+ [ 51/ 291] blk.5.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
82
+ [ 52/ 291] blk.5.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
83
+ [ 53/ 291] blk.5.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
84
+ [ 54/ 291] blk.5.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
85
+ [ 55/ 291] blk.5.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
86
+ [ 56/ 291] blk.6.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
87
+ [ 57/ 291] blk.6.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
88
+ [ 58/ 291] blk.6.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
89
+ [ 59/ 291] blk.6.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
90
+ [ 60/ 291] blk.6.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
91
+ [ 61/ 291] blk.6.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
92
+ [ 62/ 291] blk.6.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
93
+ [ 63/ 291] blk.6.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
94
+ [ 64/ 291] blk.6.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
95
+ [ 65/ 291] blk.7.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
96
+ [ 66/ 291] blk.7.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
97
+ [ 67/ 291] blk.7.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
98
+ [ 68/ 291] blk.7.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
99
+ [ 69/ 291] blk.7.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
100
+ [ 70/ 291] blk.7.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
101
+ [ 71/ 291] blk.7.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
102
+ [ 72/ 291] blk.7.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
103
+ [ 73/ 291] blk.7.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
104
+ [ 74/ 291] blk.8.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
105
+ [ 75/ 291] blk.8.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
106
+ [ 76/ 291] blk.8.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
107
+ [ 77/ 291] blk.8.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
108
+ [ 78/ 291] blk.10.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
109
+ [ 79/ 291] blk.10.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
110
+ [ 80/ 291] blk.10.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
111
+ [ 81/ 291] blk.10.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
112
+ [ 82/ 291] blk.10.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
113
+ [ 83/ 291] blk.10.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
114
+ [ 84/ 291] blk.10.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
115
+ [ 85/ 291] blk.10.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
116
+ [ 86/ 291] blk.10.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
117
+ [ 87/ 291] blk.11.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
118
+ [ 88/ 291] blk.11.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
119
+ [ 89/ 291] blk.11.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
120
+ [ 90/ 291] blk.11.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
121
+ [ 91/ 291] blk.11.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
122
+ [ 92/ 291] blk.11.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
123
+ [ 93/ 291] blk.11.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
124
+ [ 94/ 291] blk.11.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
125
+ [ 95/ 291] blk.11.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
126
+ [ 96/ 291] blk.12.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
127
+ [ 97/ 291] blk.12.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
128
+ [ 98/ 291] blk.12.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
129
+ [ 99/ 291] blk.12.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
130
+ [ 100/ 291] blk.12.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
131
+ [ 101/ 291] blk.12.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
132
+ [ 102/ 291] blk.8.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
133
+ [ 103/ 291] blk.8.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
134
+ [ 104/ 291] blk.8.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
135
+ [ 105/ 291] blk.8.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
136
+ [ 106/ 291] blk.8.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
137
+ [ 107/ 291] blk.9.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
138
+ [ 108/ 291] blk.9.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
139
+ [ 109/ 291] blk.9.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
140
+ [ 110/ 291] blk.9.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
141
+ [ 111/ 291] blk.9.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
142
+ [ 112/ 291] blk.9.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
143
+ [ 113/ 291] blk.9.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
144
+ [ 114/ 291] blk.9.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
145
+ [ 115/ 291] blk.9.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
146
+ [ 116/ 291] blk.12.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
147
+ [ 117/ 291] blk.12.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
148
+ [ 118/ 291] blk.12.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
149
+ [ 119/ 291] blk.13.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
150
+ [ 120/ 291] blk.13.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
151
+ [ 121/ 291] blk.13.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
152
+ [ 122/ 291] blk.13.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
153
+ [ 123/ 291] blk.13.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
154
+ [ 124/ 291] blk.13.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
155
+ [ 125/ 291] blk.13.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
156
+ [ 126/ 291] blk.13.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
157
+ [ 127/ 291] blk.13.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
158
+ [ 128/ 291] blk.14.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
159
+ [ 129/ 291] blk.14.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
160
+ [ 130/ 291] blk.14.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
161
+ [ 131/ 291] blk.14.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
162
+ [ 132/ 291] blk.14.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
163
+ [ 133/ 291] blk.14.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
164
+ [ 134/ 291] blk.14.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
165
+ [ 135/ 291] blk.14.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
166
+ [ 136/ 291] blk.14.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
167
+ [ 137/ 291] blk.15.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
168
+ [ 138/ 291] blk.15.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
169
+ [ 139/ 291] blk.15.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
170
+ [ 140/ 291] blk.15.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
171
+ [ 141/ 291] blk.15.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
172
+ [ 142/ 291] blk.15.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
173
+ [ 143/ 291] blk.15.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
174
+ [ 144/ 291] blk.15.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
175
+ [ 145/ 291] blk.15.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
176
+ [ 146/ 291] blk.16.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
177
+ [ 147/ 291] blk.16.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
178
+ [ 148/ 291] blk.16.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
179
+ [ 149/ 291] blk.16.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
180
+ [ 150/ 291] blk.16.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
181
+ [ 151/ 291] blk.16.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
182
+ [ 152/ 291] blk.16.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
183
+ [ 153/ 291] blk.16.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
184
+ [ 154/ 291] blk.16.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
185
+ [ 155/ 291] blk.17.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
186
+ [ 156/ 291] blk.17.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
187
+ [ 157/ 291] blk.17.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
188
+ [ 158/ 291] blk.17.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
189
+ [ 159/ 291] blk.17.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
190
+ [ 160/ 291] blk.17.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
191
+ [ 161/ 291] blk.17.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
192
+ [ 162/ 291] blk.17.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
193
+ [ 163/ 291] blk.17.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
194
+ [ 164/ 291] blk.18.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
195
+ [ 165/ 291] blk.18.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
196
+ [ 166/ 291] blk.18.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
197
+ [ 167/ 291] blk.18.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
198
+ [ 168/ 291] blk.18.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
199
+ [ 169/ 291] blk.18.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
200
+ [ 170/ 291] blk.18.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
201
+ [ 171/ 291] blk.18.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
202
+ [ 172/ 291] blk.18.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
203
+ [ 173/ 291] blk.19.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
204
+ [ 174/ 291] blk.19.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
205
+ [ 175/ 291] blk.19.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
206
+ [ 176/ 291] blk.19.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
207
+ [ 177/ 291] blk.19.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
208
+ [ 178/ 291] blk.19.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
209
+ [ 179/ 291] blk.19.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
210
+ [ 180/ 291] blk.19.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
211
+ [ 181/ 291] blk.19.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
212
+ [ 182/ 291] blk.20.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
213
+ [ 183/ 291] blk.20.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
214
+ [ 184/ 291] blk.20.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
215
+ [ 185/ 291] blk.20.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
216
+ [ 186/ 291] blk.20.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
217
+ [ 187/ 291] blk.20.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
218
+ [ 188/ 291] blk.20.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
219
+ [ 189/ 291] blk.20.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
220
+ [ 190/ 291] blk.20.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
221
+ [ 191/ 291] blk.21.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
222
+ [ 192/ 291] blk.21.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
223
+ [ 193/ 291] blk.21.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
224
+ [ 194/ 291] blk.21.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
225
+ [ 195/ 291] blk.21.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
226
+ [ 196/ 291] blk.21.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
227
+ [ 197/ 291] blk.21.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
228
+ [ 198/ 291] blk.21.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
229
+ [ 199/ 291] blk.21.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
230
+ [ 200/ 291] blk.22.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
231
+ [ 201/ 291] blk.22.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
232
+ [ 202/ 291] blk.22.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
233
+ [ 203/ 291] blk.22.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
234
+ [ 204/ 291] blk.22.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
235
+ [ 205/ 291] blk.22.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
236
+ [ 206/ 291] blk.22.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
237
+ [ 207/ 291] blk.22.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
238
+ [ 208/ 291] blk.22.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
239
+ [ 209/ 291] blk.23.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
240
+ [ 210/ 291] blk.23.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
241
+ [ 211/ 291] blk.23.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
242
+ [ 212/ 291] blk.23.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
243
+ [ 213/ 291] blk.23.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
244
+ [ 214/ 291] blk.23.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
245
+ [ 215/ 291] blk.23.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
246
+ [ 216/ 291] blk.23.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
247
+ [ 217/ 291] blk.23.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
248
+ [ 218/ 291] blk.24.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
249
+ [ 219/ 291] blk.24.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
250
+ [ 220/ 291] blk.24.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
251
+ [ 221/ 291] blk.24.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
252
+ [ 222/ 291] blk.24.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
253
+ [ 223/ 291] blk.24.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
254
+ [ 224/ 291] blk.24.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
255
+ [ 225/ 291] blk.24.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
256
+ [ 226/ 291] blk.24.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
257
+ [ 227/ 291] blk.25.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
258
+ [ 228/ 291] blk.25.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
259
+ [ 229/ 291] blk.25.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
260
+ [ 230/ 291] blk.25.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
261
+ [ 231/ 291] blk.25.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
262
+ [ 232/ 291] blk.25.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
263
+ [ 233/ 291] blk.25.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
264
+ [ 234/ 291] blk.25.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
265
+ [ 235/ 291] blk.25.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
266
+ [ 236/ 291] blk.26.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
267
+ [ 237/ 291] blk.26.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
268
+ [ 238/ 291] blk.26.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
269
+ [ 239/ 291] blk.26.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
270
+ [ 240/ 291] blk.26.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
271
+ [ 241/ 291] blk.26.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
272
+ [ 242/ 291] blk.26.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
273
+ [ 243/ 291] blk.26.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
274
+ [ 244/ 291] blk.26.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
275
+ [ 245/ 291] blk.27.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
276
+ [ 246/ 291] blk.27.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
277
+ [ 247/ 291] blk.27.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
278
+ [ 248/ 291] blk.27.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
279
+ [ 249/ 291] blk.27.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
280
+ [ 250/ 291] blk.27.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
281
+ [ 251/ 291] blk.27.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
282
+ [ 252/ 291] blk.27.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
283
+ [ 253/ 291] blk.27.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
284
+ [ 254/ 291] blk.28.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
285
+ [ 255/ 291] blk.28.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
286
+ [ 256/ 291] blk.28.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
287
+ [ 257/ 291] blk.28.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
288
+ [ 258/ 291] blk.28.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
289
+ [ 259/ 291] blk.28.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
290
+ [ 260/ 291] blk.28.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
291
+ [ 261/ 291] blk.28.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
292
+ [ 262/ 291] blk.28.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
293
+ [ 263/ 291] blk.29.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
294
+ [ 264/ 291] blk.29.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
295
+ [ 265/ 291] blk.29.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
296
+ [ 266/ 291] blk.29.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
297
+ [ 267/ 291] blk.29.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
298
+ [ 268/ 291] blk.29.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
299
+ [ 269/ 291] blk.29.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
300
+ [ 270/ 291] blk.29.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
301
+ [ 271/ 291] blk.29.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
302
+ [ 272/ 291] blk.30.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
303
+ [ 273/ 291] blk.30.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
304
+ [ 274/ 291] blk.30.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
305
+ [ 275/ 291] blk.30.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
306
+ [ 276/ 291] blk.30.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
307
+ [ 277/ 291] blk.30.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
308
+ [ 278/ 291] output.weight - [ 4096, 32000, 1, 1], type = f32, quantizing to q6_K .. size = 500.00 MiB -> 102.54 MiB | hist:
309
+ [ 279/ 291] blk.30.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
310
+ [ 280/ 291] blk.30.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
311
+ [ 281/ 291] blk.30.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
312
+ [ 282/ 291] blk.31.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
313
+ [ 283/ 291] blk.31.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
314
+ [ 284/ 291] blk.31.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
315
+ [ 285/ 291] blk.31.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
316
+ [ 286/ 291] blk.31.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
317
+ [ 287/ 291] blk.31.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
318
+ [ 288/ 291] blk.31.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
319
+ [ 289/ 291] blk.31.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
320
+ [ 290/ 291] blk.31.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
321
+ [ 291/ 291] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
322
+ llama_model_quantize_internal: model size = 27625.02 MB
323
+ llama_model_quantize_internal: quant size = 3355.27 MB
324
+
325
+ main: quantize time = 368785.09 ms
326
+ main: total time = 368785.09 ms
327
+ (dream) tb@IBM-PF38WZKF:~/funstreams/AI$ vi zephyr_f32_int3.txt
328
+ (dream) tb@IBM-PF38WZKF:~/funstreams/AI$ ./llama.cpp/quantize zephyr_f32.gguf zephyr_Q2_K.gguf Q2_K >zephyr_f32_int2.txt
329
+ main: build = 1798 (128de35)
330
+ main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
331
+ main: quantizing 'zephyr_f32.gguf' to 'zephyr_Q2_K.gguf' as Q2_K
332
+ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from zephyr_f32.gguf (version GGUF V3 (latest))
333
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
334
+ llama_model_loader: - kv 0: general.architecture str = llama
335
+ llama_model_loader: - kv 1: general.name str = .
336
+ llama_model_loader: - kv 2: llama.context_length u32 = 32768
337
+ llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
338
+ llama_model_loader: - kv 4: llama.block_count u32 = 32
339
+ llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
340
+ llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
341
+ llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
342
+ llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
343
+ llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
344
+ llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
345
+ llama_model_loader: - kv 11: general.file_type u32 = 0
346
+ llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
347
+ llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
348
+ llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
349
+ llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
350
+ llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,58980] = ["▁ t", "i n", "e r", "▁ a", "h e...
351
+ llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
352
+ llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
353
+ llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
354
+ llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
355
+ llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
356
+ llama_model_loader: - type f32: 291 tensors
357
+ llama_model_quantize_internal: meta size = 1671648 bytes
358
+ [ 1/ 291] token_embd.weight - [ 4096, 32000, 1, 1], type = f32, quantizing to q2_K .. size = 500.00 MiB -> 41.02 MiB | hist:
359
+ [ 2/ 291] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
360
+ [ 3/ 291] blk.0.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
361
+ [ 4/ 291] blk.0.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
362
+ [ 5/ 291] blk.0.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
363
+ [ 6/ 291] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
364
+ [ 7/ 291] blk.0.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
365
+ [ 8/ 291] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
366
+ [ 9/ 291] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
367
+ [ 10/ 291] blk.0.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
368
+ [ 11/ 291] blk.1.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
369
+ [ 12/ 291] blk.1.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
370
+ [ 13/ 291] blk.1.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
371
+ [ 14/ 291] blk.1.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
372
+ [ 15/ 291] blk.1.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
373
+ [ 16/ 291] blk.1.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
374
+ [ 17/ 291] blk.1.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
375
+ [ 18/ 291] blk.1.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
376
+ [ 19/ 291] blk.1.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
377
+ [ 20/ 291] blk.2.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
378
+ [ 21/ 291] blk.2.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
379
+ [ 22/ 291] blk.2.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
380
+ [ 23/ 291] blk.2.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
381
+ [ 24/ 291] blk.2.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
382
+ [ 25/ 291] blk.2.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
383
+ [ 26/ 291] blk.2.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
384
+ [ 27/ 291] blk.2.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
385
+ [ 28/ 291] blk.2.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
386
+ [ 29/ 291] blk.3.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
387
+ [ 30/ 291] blk.3.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
388
+ [ 31/ 291] blk.3.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
389
+ [ 32/ 291] blk.3.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
390
+ [ 33/ 291] blk.3.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
391
+ [ 34/ 291] blk.3.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
392
+ [ 35/ 291] blk.3.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
393
+ [ 36/ 291] blk.3.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
394
+ [ 37/ 291] blk.3.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
395
+ [ 38/ 291] blk.4.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
396
+ [ 39/ 291] blk.4.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
397
+ [ 40/ 291] blk.4.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
398
+ [ 41/ 291] blk.4.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
399
+ [ 42/ 291] blk.4.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
400
+ [ 43/ 291] blk.4.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
401
+ [ 44/ 291] blk.4.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
402
+ [ 45/ 291] blk.4.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
403
+ [ 46/ 291] blk.4.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
404
+ [ 47/ 291] blk.5.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
405
+ [ 48/ 291] blk.5.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
406
+ [ 49/ 291] blk.5.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
407
+ [ 50/ 291] blk.5.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
408
+ [ 51/ 291] blk.5.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
409
+ [ 52/ 291] blk.5.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
410
+ [ 53/ 291] blk.5.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
411
+ [ 54/ 291] blk.5.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
412
+ [ 55/ 291] blk.5.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
413
+ [ 56/ 291] blk.6.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
414
+ [ 57/ 291] blk.6.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
415
+ [ 58/ 291] blk.6.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
416
+ [ 59/ 291] blk.6.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
417
+ [ 60/ 291] blk.6.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
418
+ [ 61/ 291] blk.6.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
419
+ [ 62/ 291] blk.6.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
420
+ [ 63/ 291] blk.6.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
421
+ [ 64/ 291] blk.6.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
422
+ [ 65/ 291] blk.7.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
423
+ [ 66/ 291] blk.7.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
424
+ [ 67/ 291] blk.7.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
425
+ [ 68/ 291] blk.7.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
426
+ [ 69/ 291] blk.7.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
427
+ [ 70/ 291] blk.7.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
428
+ [ 71/ 291] blk.7.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
429
+ [ 72/ 291] blk.7.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
430
+ [ 73/ 291] blk.7.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
431
+ [ 74/ 291] blk.8.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
432
+ [ 75/ 291] blk.8.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
433
+ [ 76/ 291] blk.8.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
434
+ [ 77/ 291] blk.8.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
435
+ [ 78/ 291] blk.10.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
436
+ [ 79/ 291] blk.10.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
437
+ [ 80/ 291] blk.10.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
438
+ [ 81/ 291] blk.10.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
439
+ [ 82/ 291] blk.10.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
440
+ [ 83/ 291] blk.10.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
441
+ [ 84/ 291] blk.10.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
442
+ [ 85/ 291] blk.10.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
443
+ [ 86/ 291] blk.10.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
444
+ [ 87/ 291] blk.11.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
445
+ [ 88/ 291] blk.11.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
446
+ [ 89/ 291] blk.11.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
447
+ [ 90/ 291] blk.11.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
448
+ [ 91/ 291] blk.11.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
449
+ [ 92/ 291] blk.11.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
450
+ [ 93/ 291] blk.11.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
451
+ [ 94/ 291] blk.11.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
452
+ [ 95/ 291] blk.11.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
453
+ [ 96/ 291] blk.12.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
454
+ [ 97/ 291] blk.12.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
455
+ [ 98/ 291] blk.12.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
456
+ [ 99/ 291] blk.12.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
457
+ [ 100/ 291] blk.12.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
458
+ [ 101/ 291] blk.12.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
459
+ [ 102/ 291] blk.8.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
460
+ [ 103/ 291] blk.8.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
461
+ [ 104/ 291] blk.8.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
462
+ [ 105/ 291] blk.8.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
463
+ [ 106/ 291] blk.8.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
464
+ [ 107/ 291] blk.9.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
465
+ [ 108/ 291] blk.9.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
466
+ [ 109/ 291] blk.9.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
467
+ [ 110/ 291] blk.9.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
468
+ [ 111/ 291] blk.9.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
469
+ [ 112/ 291] blk.9.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
470
+ [ 113/ 291] blk.9.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
471
+ [ 114/ 291] blk.9.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
472
+ [ 115/ 291] blk.9.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
473
+ [ 116/ 291] blk.12.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
474
+ [ 117/ 291] blk.12.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
475
+ [ 118/ 291] blk.12.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
476
+ [ 119/ 291] blk.13.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
477
+ [ 120/ 291] blk.13.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
478
+ [ 121/ 291] blk.13.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
479
+ [ 122/ 291] blk.13.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
480
+ [ 123/ 291] blk.13.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
481
+ [ 124/ 291] blk.13.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
482
+ [ 125/ 291] blk.13.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
483
+ [ 126/ 291] blk.13.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
484
+ [ 127/ 291] blk.13.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
485
+ [ 128/ 291] blk.14.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
486
+ [ 129/ 291] blk.14.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
487
+ [ 130/ 291] blk.14.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
488
+ [ 131/ 291] blk.14.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
489
+ [ 132/ 291] blk.14.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
490
+ [ 133/ 291] blk.14.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
491
+ [ 134/ 291] blk.14.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
492
+ [ 135/ 291] blk.14.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
493
+ [ 136/ 291] blk.14.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
494
+ [ 137/ 291] blk.15.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
495
+ [ 138/ 291] blk.15.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
496
+ [ 139/ 291] blk.15.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
497
+ [ 140/ 291] blk.15.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
498
+ [ 141/ 291] blk.15.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
499
+ [ 142/ 291] blk.15.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
500
+ [ 143/ 291] blk.15.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
501
+ [ 144/ 291] blk.15.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
502
+ [ 145/ 291] blk.15.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
503
+ [ 146/ 291] blk.16.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
504
+ [ 147/ 291] blk.16.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
505
+ [ 148/ 291] blk.16.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
506
+ [ 149/ 291] blk.16.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
507
+ [ 150/ 291] blk.16.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
508
+ [ 151/ 291] blk.16.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
509
+ [ 152/ 291] blk.16.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
510
+ [ 153/ 291] blk.16.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
511
+ [ 154/ 291] blk.16.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
512
+ [ 155/ 291] blk.17.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
513
+ [ 156/ 291] blk.17.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
514
+ [ 157/ 291] blk.17.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
515
+ [ 158/ 291] blk.17.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
516
+ [ 159/ 291] blk.17.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
517
+ [ 160/ 291] blk.17.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
518
+ [ 161/ 291] blk.17.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
519
+ [ 162/ 291] blk.17.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
520
+ [ 163/ 291] blk.17.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
521
+ [ 164/ 291] blk.18.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
522
+ [ 165/ 291] blk.18.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
523
+ [ 166/ 291] blk.18.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
524
+ [ 167/ 291] blk.18.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
525
+ [ 168/ 291] blk.18.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
526
+ [ 169/ 291] blk.18.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
527
+ [ 170/ 291] blk.18.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
528
+ [ 171/ 291] blk.18.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
529
+ [ 172/ 291] blk.18.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
530
+ [ 173/ 291] blk.19.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
531
+ [ 174/ 291] blk.19.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
532
+ [ 175/ 291] blk.19.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
533
+ [ 176/ 291] blk.19.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
534
+ [ 177/ 291] blk.19.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
535
+ [ 178/ 291] blk.19.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
536
+ [ 179/ 291] blk.19.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
537
+ [ 180/ 291] blk.19.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
538
+ [ 181/ 291] blk.19.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
539
+ [ 182/ 291] blk.20.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
540
+ [ 183/ 291] blk.20.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
541
+ [ 184/ 291] blk.20.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
542
+ [ 185/ 291] blk.20.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
543
+ [ 186/ 291] blk.20.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
544
+ [ 187/ 291] blk.20.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
545
+ [ 188/ 291] blk.20.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
546
+ [ 189/ 291] blk.20.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
547
+ [ 190/ 291] blk.20.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
548
+ [ 191/ 291] blk.21.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
549
+ [ 192/ 291] blk.21.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
550
+ [ 193/ 291] blk.21.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
551
+ [ 194/ 291] blk.21.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
552
+ [ 195/ 291] blk.21.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
553
+ [ 196/ 291] blk.21.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
554
+ [ 197/ 291] blk.21.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
555
+ [ 198/ 291] blk.21.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
556
+ [ 199/ 291] blk.21.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
557
+ [ 200/ 291] blk.22.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
558
+ [ 201/ 291] blk.22.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
559
+ [ 202/ 291] blk.22.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
560
+ [ 203/ 291] blk.22.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
561
+ [ 204/ 291] blk.22.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
562
+ [ 205/ 291] blk.22.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
563
+ [ 206/ 291] blk.22.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
564
+ [ 207/ 291] blk.22.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
565
+ [ 208/ 291] blk.22.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
566
+ [ 209/ 291] blk.23.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
567
+ [ 210/ 291] blk.23.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
568
+ [ 211/ 291] blk.23.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
569
+ [ 212/ 291] blk.23.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
570
+ [ 213/ 291] blk.23.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
571
+ [ 214/ 291] blk.23.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
572
+ [ 215/ 291] blk.23.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
573
+ [ 216/ 291] blk.23.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
574
+ [ 217/ 291] blk.23.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
575
+ [ 218/ 291] blk.24.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
576
+ [ 219/ 291] blk.24.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
577
+ [ 220/ 291] blk.24.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
578
+ [ 221/ 291] blk.24.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
579
+ [ 222/ 291] blk.24.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
580
+ [ 223/ 291] blk.24.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
581
+ [ 224/ 291] blk.24.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
582
+ [ 225/ 291] blk.24.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
583
+ [ 226/ 291] blk.24.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
584
+ [ 227/ 291] blk.25.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
585
+ [ 228/ 291] blk.25.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
586
+ [ 229/ 291] blk.25.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
587
+ [ 230/ 291] blk.25.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
588
+ [ 231/ 291] blk.25.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
589
+ [ 232/ 291] blk.25.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
590
+ [ 233/ 291] blk.25.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
591
+ [ 234/ 291] blk.25.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
592
+ [ 235/ 291] blk.25.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
593
+ [ 236/ 291] blk.26.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
594
+ [ 237/ 291] blk.26.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
595
+ [ 238/ 291] blk.26.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
596
+ [ 239/ 291] blk.26.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
597
+ [ 240/ 291] blk.26.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
598
+ [ 241/ 291] blk.26.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
599
+ [ 242/ 291] blk.26.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
600
+ [ 243/ 291] blk.26.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
601
+ [ 244/ 291] blk.26.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
602
+ [ 245/ 291] blk.27.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
603
+ [ 246/ 291] blk.27.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
604
+ [ 247/ 291] blk.27.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
605
+ [ 248/ 291] blk.27.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
606
+ [ 249/ 291] blk.27.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
607
+ [ 250/ 291] blk.27.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
608
+ [ 251/ 291] blk.27.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
609
+ [ 252/ 291] blk.27.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
610
+ [ 253/ 291] blk.27.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
611
+ [ 254/ 291] blk.28.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
612
+ [ 255/ 291] blk.28.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
613
+ [ 256/ 291] blk.28.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
614
+ [ 257/ 291] blk.28.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
615
+ [ 258/ 291] blk.28.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
616
+ [ 259/ 291] blk.28.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
617
+ [ 260/ 291] blk.28.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
618
+ [ 261/ 291] blk.28.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
619
+ [ 262/ 291] blk.28.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
620
+ [ 263/ 291] blk.29.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
621
+ [ 264/ 291] blk.29.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
622
+ [ 265/ 291] blk.29.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
623
+ [ 266/ 291] blk.29.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
624
+ [ 267/ 291] blk.29.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
625
+ [ 268/ 291] blk.29.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
626
+ [ 269/ 291] blk.29.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
627
+ [ 270/ 291] blk.29.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
628
+ [ 271/ 291] blk.29.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
629
+ [ 272/ 291] blk.30.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
630
+ [ 273/ 291] blk.30.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
631
+ [ 274/ 291] blk.30.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
632
+ [ 275/ 291] blk.30.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
633
+ [ 276/ 291] blk.30.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
634
+ [ 277/ 291] blk.30.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
635
+ [ 278/ 291] output.weight - [ 4096, 32000, 1, 1], type = f32, quantizing to q6_K .. size = 500.00 MiB -> 102.54 MiB | hist:
636
+ [ 279/ 291] blk.30.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
637
+ [ 280/ 291] blk.30.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
638
+ [ 281/ 291] blk.30.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
639
+ [ 282/ 291] blk.31.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
640
+ [ 283/ 291] blk.31.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
641
+ [ 284/ 291] blk.31.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
642
+ [ 285/ 291] blk.31.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
643
+ [ 286/ 291] blk.31.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
644
+ [ 287/ 291] blk.31.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q2_K .. size = 16.00 MiB -> 1.31 MiB | hist:
645
+ [ 288/ 291] blk.31.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
646
+ [ 289/ 291] blk.31.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q2_K .. size = 64.00 MiB -> 5.25 MiB | hist:
647
+ [ 290/ 291] blk.31.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
648
+ [ 291/ 291] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
649
+ llama_model_quantize_internal: model size = 27625.02 MB
650
+ llama_model_quantize_internal: quant size = 2939.57 MB
zephyr_f32_int3.txt ADDED
@@ -0,0 +1,326 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ (dream) tb@IBM-PF38WZKF:~/funstreams/AI$ ./llama.cpp/quantize zephyr_f32.gguf Q3_K
2
+ main: build = 1798 (128de35)
3
+ main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
4
+ main: quantizing 'zephyr_f32.gguf' to 'ggml-model-Q3_K.gguf' as Q3_K
5
+ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from zephyr_f32.gguf (version GGUF V3 (latest))
6
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
7
+ llama_model_loader: - kv 0: general.architecture str = llama
8
+ llama_model_loader: - kv 1: general.name str = .
9
+ llama_model_loader: - kv 2: llama.context_length u32 = 32768
10
+ llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
11
+ llama_model_loader: - kv 4: llama.block_count u32 = 32
12
+ llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
13
+ llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
14
+ llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
15
+ llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
16
+ llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
17
+ llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
18
+ llama_model_loader: - kv 11: general.file_type u32 = 0
19
+ llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
20
+ llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
21
+ llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
22
+ llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
23
+ llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,58980] = ["▁ t", "i n", "e r", "▁ a", "h e...
24
+ llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
25
+ llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
26
+ llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
27
+ llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
28
+ llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
29
+ llama_model_loader: - type f32: 291 tensors
30
+ llama_model_quantize_internal: meta size = 1671648 bytes
31
+ [ 1/ 291] token_embd.weight - [ 4096, 32000, 1, 1], type = f32, quantizing to q3_K .. size = 500.00 MiB -> 53.71 MiB | hist:
32
+ [ 2/ 291] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
33
+ [ 3/ 291] blk.0.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q5_K .. size = 224.00 MiB -> 38.50 MiB | hist:
34
+ [ 4/ 291] blk.0.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
35
+ [ 5/ 291] blk.0.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
36
+ [ 6/ 291] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
37
+ [ 7/ 291] blk.0.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
38
+ [ 8/ 291] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
39
+ [ 9/ 291] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
40
+ [ 10/ 291] blk.0.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q5_K .. size = 16.00 MiB -> 2.75 MiB | hist:
41
+ [ 11/ 291] blk.1.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
42
+ [ 12/ 291] blk.1.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q5_K .. size = 224.00 MiB -> 38.50 MiB | hist:
43
+ [ 13/ 291] blk.1.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
44
+ [ 14/ 291] blk.1.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
45
+ [ 15/ 291] blk.1.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
46
+ [ 16/ 291] blk.1.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
47
+ [ 17/ 291] blk.1.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
48
+ [ 18/ 291] blk.1.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
49
+ [ 19/ 291] blk.1.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q5_K .. size = 16.00 MiB -> 2.75 MiB | hist:
50
+ [ 20/ 291] blk.2.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
51
+ [ 21/ 291] blk.2.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
52
+ [ 22/ 291] blk.2.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
53
+ [ 23/ 291] blk.2.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
54
+ [ 24/ 291] blk.2.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
55
+ [ 25/ 291] blk.2.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
56
+ [ 26/ 291] blk.2.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
57
+ [ 27/ 291] blk.2.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
58
+ [ 28/ 291] blk.2.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
59
+ [ 29/ 291] blk.3.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
60
+ [ 30/ 291] blk.3.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
61
+ [ 31/ 291] blk.3.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
62
+ [ 32/ 291] blk.3.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
63
+ [ 33/ 291] blk.3.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
64
+ [ 34/ 291] blk.3.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
65
+ [ 35/ 291] blk.3.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
66
+ [ 36/ 291] blk.3.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
67
+ [ 37/ 291] blk.3.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
68
+ [ 38/ 291] blk.4.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
69
+ [ 39/ 291] blk.4.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
70
+ [ 40/ 291] blk.4.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
71
+ [ 41/ 291] blk.4.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
72
+ [ 42/ 291] blk.4.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
73
+ [ 43/ 291] blk.4.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
74
+ [ 44/ 291] blk.4.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
75
+ [ 45/ 291] blk.4.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
76
+ [ 46/ 291] blk.4.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
77
+ [ 47/ 291] blk.5.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
78
+ [ 48/ 291] blk.5.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
79
+ [ 49/ 291] blk.5.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
80
+ [ 50/ 291] blk.5.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
81
+ [ 51/ 291] blk.5.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
82
+ [ 52/ 291] blk.5.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
83
+ [ 53/ 291] blk.5.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
84
+ [ 54/ 291] blk.5.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
85
+ [ 55/ 291] blk.5.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
86
+ [ 56/ 291] blk.6.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
87
+ [ 57/ 291] blk.6.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
88
+ [ 58/ 291] blk.6.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
89
+ [ 59/ 291] blk.6.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
90
+ [ 60/ 291] blk.6.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
91
+ [ 61/ 291] blk.6.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
92
+ [ 62/ 291] blk.6.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
93
+ [ 63/ 291] blk.6.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
94
+ [ 64/ 291] blk.6.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
95
+ [ 65/ 291] blk.7.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
96
+ [ 66/ 291] blk.7.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
97
+ [ 67/ 291] blk.7.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
98
+ [ 68/ 291] blk.7.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
99
+ [ 69/ 291] blk.7.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
100
+ [ 70/ 291] blk.7.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
101
+ [ 71/ 291] blk.7.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
102
+ [ 72/ 291] blk.7.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
103
+ [ 73/ 291] blk.7.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
104
+ [ 74/ 291] blk.8.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
105
+ [ 75/ 291] blk.8.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
106
+ [ 76/ 291] blk.8.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
107
+ [ 77/ 291] blk.8.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
108
+ [ 78/ 291] blk.10.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
109
+ [ 79/ 291] blk.10.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
110
+ [ 80/ 291] blk.10.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
111
+ [ 81/ 291] blk.10.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
112
+ [ 82/ 291] blk.10.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
113
+ [ 83/ 291] blk.10.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
114
+ [ 84/ 291] blk.10.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
115
+ [ 85/ 291] blk.10.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
116
+ [ 86/ 291] blk.10.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
117
+ [ 87/ 291] blk.11.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
118
+ [ 88/ 291] blk.11.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
119
+ [ 89/ 291] blk.11.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
120
+ [ 90/ 291] blk.11.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
121
+ [ 91/ 291] blk.11.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
122
+ [ 92/ 291] blk.11.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
123
+ [ 93/ 291] blk.11.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
124
+ [ 94/ 291] blk.11.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
125
+ [ 95/ 291] blk.11.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
126
+ [ 96/ 291] blk.12.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
127
+ [ 97/ 291] blk.12.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
128
+ [ 98/ 291] blk.12.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
129
+ [ 99/ 291] blk.12.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
130
+ [ 100/ 291] blk.12.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
131
+ [ 101/ 291] blk.12.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
132
+ [ 102/ 291] blk.8.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
133
+ [ 103/ 291] blk.8.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
134
+ [ 104/ 291] blk.8.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
135
+ [ 105/ 291] blk.8.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
136
+ [ 106/ 291] blk.8.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
137
+ [ 107/ 291] blk.9.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
138
+ [ 108/ 291] blk.9.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
139
+ [ 109/ 291] blk.9.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
140
+ [ 110/ 291] blk.9.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
141
+ [ 111/ 291] blk.9.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
142
+ [ 112/ 291] blk.9.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
143
+ [ 113/ 291] blk.9.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
144
+ [ 114/ 291] blk.9.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
145
+ [ 115/ 291] blk.9.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
146
+ [ 116/ 291] blk.12.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
147
+ [ 117/ 291] blk.12.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
148
+ [ 118/ 291] blk.12.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
149
+ [ 119/ 291] blk.13.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
150
+ [ 120/ 291] blk.13.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
151
+ [ 121/ 291] blk.13.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
152
+ [ 122/ 291] blk.13.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
153
+ [ 123/ 291] blk.13.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
154
+ [ 124/ 291] blk.13.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
155
+ [ 125/ 291] blk.13.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
156
+ [ 126/ 291] blk.13.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
157
+ [ 127/ 291] blk.13.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
158
+ [ 128/ 291] blk.14.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
159
+ [ 129/ 291] blk.14.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
160
+ [ 130/ 291] blk.14.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
161
+ [ 131/ 291] blk.14.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
162
+ [ 132/ 291] blk.14.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
163
+ [ 133/ 291] blk.14.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
164
+ [ 134/ 291] blk.14.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
165
+ [ 135/ 291] blk.14.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
166
+ [ 136/ 291] blk.14.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
167
+ [ 137/ 291] blk.15.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
168
+ [ 138/ 291] blk.15.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
169
+ [ 139/ 291] blk.15.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
170
+ [ 140/ 291] blk.15.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
171
+ [ 141/ 291] blk.15.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
172
+ [ 142/ 291] blk.15.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
173
+ [ 143/ 291] blk.15.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
174
+ [ 144/ 291] blk.15.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
175
+ [ 145/ 291] blk.15.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
176
+ [ 146/ 291] blk.16.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
177
+ [ 147/ 291] blk.16.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
178
+ [ 148/ 291] blk.16.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
179
+ [ 149/ 291] blk.16.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
180
+ [ 150/ 291] blk.16.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
181
+ [ 151/ 291] blk.16.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
182
+ [ 152/ 291] blk.16.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
183
+ [ 153/ 291] blk.16.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
184
+ [ 154/ 291] blk.16.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
185
+ [ 155/ 291] blk.17.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
186
+ [ 156/ 291] blk.17.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
187
+ [ 157/ 291] blk.17.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
188
+ [ 158/ 291] blk.17.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
189
+ [ 159/ 291] blk.17.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
190
+ [ 160/ 291] blk.17.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
191
+ [ 161/ 291] blk.17.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
192
+ [ 162/ 291] blk.17.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
193
+ [ 163/ 291] blk.17.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
194
+ [ 164/ 291] blk.18.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
195
+ [ 165/ 291] blk.18.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
196
+ [ 166/ 291] blk.18.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
197
+ [ 167/ 291] blk.18.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
198
+ [ 168/ 291] blk.18.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
199
+ [ 169/ 291] blk.18.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
200
+ [ 170/ 291] blk.18.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
201
+ [ 171/ 291] blk.18.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
202
+ [ 172/ 291] blk.18.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
203
+ [ 173/ 291] blk.19.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
204
+ [ 174/ 291] blk.19.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
205
+ [ 175/ 291] blk.19.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
206
+ [ 176/ 291] blk.19.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
207
+ [ 177/ 291] blk.19.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
208
+ [ 178/ 291] blk.19.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
209
+ [ 179/ 291] blk.19.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
210
+ [ 180/ 291] blk.19.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
211
+ [ 181/ 291] blk.19.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
212
+ [ 182/ 291] blk.20.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
213
+ [ 183/ 291] blk.20.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
214
+ [ 184/ 291] blk.20.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
215
+ [ 185/ 291] blk.20.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
216
+ [ 186/ 291] blk.20.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
217
+ [ 187/ 291] blk.20.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
218
+ [ 188/ 291] blk.20.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
219
+ [ 189/ 291] blk.20.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
220
+ [ 190/ 291] blk.20.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
221
+ [ 191/ 291] blk.21.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
222
+ [ 192/ 291] blk.21.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
223
+ [ 193/ 291] blk.21.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
224
+ [ 194/ 291] blk.21.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
225
+ [ 195/ 291] blk.21.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
226
+ [ 196/ 291] blk.21.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
227
+ [ 197/ 291] blk.21.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
228
+ [ 198/ 291] blk.21.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
229
+ [ 199/ 291] blk.21.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
230
+ [ 200/ 291] blk.22.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
231
+ [ 201/ 291] blk.22.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
232
+ [ 202/ 291] blk.22.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
233
+ [ 203/ 291] blk.22.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
234
+ [ 204/ 291] blk.22.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
235
+ [ 205/ 291] blk.22.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
236
+ [ 206/ 291] blk.22.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
237
+ [ 207/ 291] blk.22.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
238
+ [ 208/ 291] blk.22.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
239
+ [ 209/ 291] blk.23.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
240
+ [ 210/ 291] blk.23.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
241
+ [ 211/ 291] blk.23.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
242
+ [ 212/ 291] blk.23.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
243
+ [ 213/ 291] blk.23.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
244
+ [ 214/ 291] blk.23.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
245
+ [ 215/ 291] blk.23.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
246
+ [ 216/ 291] blk.23.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
247
+ [ 217/ 291] blk.23.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
248
+ [ 218/ 291] blk.24.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
249
+ [ 219/ 291] blk.24.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
250
+ [ 220/ 291] blk.24.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
251
+ [ 221/ 291] blk.24.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
252
+ [ 222/ 291] blk.24.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
253
+ [ 223/ 291] blk.24.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
254
+ [ 224/ 291] blk.24.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
255
+ [ 225/ 291] blk.24.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
256
+ [ 226/ 291] blk.24.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
257
+ [ 227/ 291] blk.25.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
258
+ [ 228/ 291] blk.25.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
259
+ [ 229/ 291] blk.25.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
260
+ [ 230/ 291] blk.25.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
261
+ [ 231/ 291] blk.25.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
262
+ [ 232/ 291] blk.25.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
263
+ [ 233/ 291] blk.25.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
264
+ [ 234/ 291] blk.25.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
265
+ [ 235/ 291] blk.25.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
266
+ [ 236/ 291] blk.26.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
267
+ [ 237/ 291] blk.26.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
268
+ [ 238/ 291] blk.26.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
269
+ [ 239/ 291] blk.26.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
270
+ [ 240/ 291] blk.26.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
271
+ [ 241/ 291] blk.26.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
272
+ [ 242/ 291] blk.26.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
273
+ [ 243/ 291] blk.26.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
274
+ [ 244/ 291] blk.26.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
275
+ [ 245/ 291] blk.27.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
276
+ [ 246/ 291] blk.27.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
277
+ [ 247/ 291] blk.27.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
278
+ [ 248/ 291] blk.27.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
279
+ [ 249/ 291] blk.27.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
280
+ [ 250/ 291] blk.27.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
281
+ [ 251/ 291] blk.27.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
282
+ [ 252/ 291] blk.27.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
283
+ [ 253/ 291] blk.27.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
284
+ [ 254/ 291] blk.28.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
285
+ [ 255/ 291] blk.28.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
286
+ [ 256/ 291] blk.28.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
287
+ [ 257/ 291] blk.28.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
288
+ [ 258/ 291] blk.28.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
289
+ [ 259/ 291] blk.28.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
290
+ [ 260/ 291] blk.28.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
291
+ [ 261/ 291] blk.28.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
292
+ [ 262/ 291] blk.28.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
293
+ [ 263/ 291] blk.29.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
294
+ [ 264/ 291] blk.29.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
295
+ [ 265/ 291] blk.29.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
296
+ [ 266/ 291] blk.29.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
297
+ [ 267/ 291] blk.29.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
298
+ [ 268/ 291] blk.29.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
299
+ [ 269/ 291] blk.29.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
300
+ [ 270/ 291] blk.29.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
301
+ [ 271/ 291] blk.29.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
302
+ [ 272/ 291] blk.30.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
303
+ [ 273/ 291] blk.30.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
304
+ [ 274/ 291] blk.30.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
305
+ [ 275/ 291] blk.30.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
306
+ [ 276/ 291] blk.30.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
307
+ [ 277/ 291] blk.30.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
308
+ [ 278/ 291] output.weight - [ 4096, 32000, 1, 1], type = f32, quantizing to q6_K .. size = 500.00 MiB -> 102.54 MiB | hist:
309
+ [ 279/ 291] blk.30.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
310
+ [ 280/ 291] blk.30.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
311
+ [ 281/ 291] blk.30.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
312
+ [ 282/ 291] blk.31.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
313
+ [ 283/ 291] blk.31.ffn_down.weight - [14336, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 224.00 MiB -> 31.50 MiB | hist:
314
+ [ 284/ 291] blk.31.ffn_gate.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
315
+ [ 285/ 291] blk.31.ffn_up.weight - [ 4096, 14336, 1, 1], type = f32, quantizing to q3_K .. size = 224.00 MiB -> 24.06 MiB | hist:
316
+ [ 286/ 291] blk.31.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
317
+ [ 287/ 291] blk.31.attn_k.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q3_K .. size = 16.00 MiB -> 1.72 MiB | hist:
318
+ [ 288/ 291] blk.31.attn_output.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q4_K .. size = 64.00 MiB -> 9.00 MiB | hist:
319
+ [ 289/ 291] blk.31.attn_q.weight - [ 4096, 4096, 1, 1], type = f32, quantizing to q3_K .. size = 64.00 MiB -> 6.88 MiB | hist:
320
+ [ 290/ 291] blk.31.attn_v.weight - [ 4096, 1024, 1, 1], type = f32, quantizing to q4_K .. size = 16.00 MiB -> 2.25 MiB | hist:
321
+ [ 291/ 291] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
322
+ llama_model_quantize_internal: model size = 27625.02 MB
323
+ llama_model_quantize_internal: quant size = 3355.27 MB
324
+
325
+ main: quantize time = 368785.09 ms
326
+ main: total time = 368785.09 ms
zephyr_int8_int3.txt ADDED
@@ -0,0 +1,362 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ (dream) tb@IBM-PF38WZKF:~/funstreams/AI$ ./llama.cpp/quantize zephyr_int8.gguf zephyr_Q3_K_M.gguf Q3_K_M
2
+ main: build = 1798 (128de35)
3
+ main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
4
+ main: quantizing 'zephyr_int8.gguf' to 'zephyr_Q3_K_M.gguf' as Q3_K_M
5
+ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from zephyr_int8.gguf (version GGUF V3 (latest))
6
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
7
+ llama_model_loader: - kv 0: general.architecture str = llama
8
+ llama_model_loader: - kv 1: general.name str = .
9
+ llama_model_loader: - kv 2: llama.context_length u32 = 32768
10
+ llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
11
+ llama_model_loader: - kv 4: llama.block_count u32 = 32
12
+ llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
13
+ llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
14
+ llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
15
+ llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
16
+ llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
17
+ llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
18
+ llama_model_loader: - kv 11: general.file_type u32 = 7
19
+ llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
20
+ llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
21
+ llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
22
+ llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
23
+ llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,58980] = ["▁ t", "i n", "e r", "▁ a", "h e...
24
+ llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
25
+ llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
26
+ llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
27
+ llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
28
+ llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
29
+ llama_model_loader: - type f32: 65 tensors
30
+ llama_model_loader: - type q8_0: 226 tensors
31
+ llama_model_quantize_internal: meta size = 1671648 bytes
32
+ [ 1/ 291] token_embd.weight - [ 4096, 32000, 1, 1], type = q8_0, llama_model_quantize: failed to quantize: requantizing from type q8_0 is disabled
33
+ main: failed to quantize model from 'zephyr_int8.gguf'
34
+ (dream) tb@IBM-PF38WZKF:~/funstreams/AI$ ./llama.cpp/quantize --allow-requantize zephyr_int8.gguf zephyr_Q3_K_M.gguf Q3_
35
+ K_M
36
+ main: build = 1798 (128de35)
37
+ main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
38
+ main: quantizing 'zephyr_int8.gguf' to 'zephyr_Q3_K_M.gguf' as Q3_K_M
39
+ llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from zephyr_int8.gguf (version GGUF V3 (latest))
40
+ llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
41
+ llama_model_loader: - kv 0: general.architecture str = llama
42
+ llama_model_loader: - kv 1: general.name str = .
43
+ llama_model_loader: - kv 2: llama.context_length u32 = 32768
44
+ llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
45
+ llama_model_loader: - kv 4: llama.block_count u32 = 32
46
+ llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
47
+ llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
48
+ llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
49
+ llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
50
+ llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
51
+ llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
52
+ llama_model_loader: - kv 11: general.file_type u32 = 7
53
+ llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
54
+ llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
55
+ llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
56
+ llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
57
+ llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,58980] = ["▁ t", "i n", "e r", "▁ a", "h e...
58
+ llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
59
+ llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 2
60
+ llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
61
+ llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 2
62
+ llama_model_loader: - kv 21: tokenizer.chat_template str = {% for message in messages %}\n{% if m...
63
+ llama_model_loader: - type f32: 65 tensors
64
+ llama_model_loader: - type q8_0: 226 tensors
65
+ llama_model_quantize_internal: meta size = 1671648 bytes
66
+ [ 1/ 291] token_embd.weight - [ 4096, 32000, 1, 1], type = q8_0, quantizing to q3_K .. size = 132.81 MiB -> 53.71 MiB | hist:
67
+ [ 2/ 291] blk.0.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
68
+ [ 3/ 291] blk.0.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q5_K .. size = 59.50 MiB -> 38.50 MiB | hist:
69
+ [ 4/ 291] blk.0.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
70
+ [ 5/ 291] blk.0.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
71
+ [ 6/ 291] blk.0.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
72
+ [ 7/ 291] blk.0.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
73
+ [ 8/ 291] blk.0.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
74
+ [ 9/ 291] blk.0.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
75
+ [ 10/ 291] blk.0.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q5_K .. size = 4.25 MiB -> 2.75 MiB | hist:
76
+ [ 11/ 291] blk.1.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
77
+ [ 12/ 291] blk.1.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q5_K .. size = 59.50 MiB -> 38.50 MiB | hist:
78
+ [ 13/ 291] blk.1.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
79
+ [ 14/ 291] blk.1.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
80
+ [ 15/ 291] blk.1.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
81
+ [ 16/ 291] blk.1.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
82
+ [ 17/ 291] blk.1.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
83
+ [ 18/ 291] blk.1.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
84
+ [ 19/ 291] blk.1.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q5_K .. size = 4.25 MiB -> 2.75 MiB | hist:
85
+ [ 20/ 291] blk.2.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
86
+ [ 21/ 291] blk.2.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
87
+ [ 22/ 291] blk.2.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
88
+ [ 23/ 291] blk.2.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
89
+ [ 24/ 291] blk.2.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
90
+ [ 25/ 291] blk.2.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
91
+ [ 26/ 291] blk.2.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
92
+ [ 27/ 291] blk.2.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
93
+ [ 28/ 291] blk.2.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
94
+ [ 29/ 291] blk.3.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
95
+ [ 30/ 291] blk.3.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
96
+ [ 31/ 291] blk.3.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
97
+ [ 32/ 291] blk.3.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
98
+ [ 33/ 291] blk.3.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
99
+ [ 34/ 291] blk.3.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
100
+ [ 35/ 291] blk.3.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
101
+ [ 36/ 291] blk.3.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
102
+ [ 37/ 291] blk.3.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
103
+ [ 38/ 291] blk.4.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
104
+ [ 39/ 291] blk.4.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
105
+ [ 40/ 291] blk.4.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
106
+ [ 41/ 291] blk.4.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
107
+ [ 42/ 291] blk.4.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
108
+ [ 43/ 291] blk.4.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
109
+ [ 44/ 291] blk.4.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
110
+ [ 45/ 291] blk.4.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
111
+ [ 46/ 291] blk.4.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
112
+ [ 47/ 291] blk.5.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
113
+ [ 48/ 291] blk.5.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
114
+ [ 49/ 291] blk.5.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
115
+ [ 50/ 291] blk.5.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
116
+ [ 51/ 291] blk.5.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
117
+ [ 52/ 291] blk.5.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
118
+ [ 53/ 291] blk.5.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
119
+ [ 54/ 291] blk.5.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
120
+ [ 55/ 291] blk.5.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
121
+ [ 56/ 291] blk.6.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
122
+ [ 57/ 291] blk.6.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
123
+ [ 58/ 291] blk.6.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
124
+ [ 59/ 291] blk.6.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
125
+ [ 60/ 291] blk.6.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
126
+ [ 61/ 291] blk.6.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
127
+ [ 62/ 291] blk.6.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
128
+ [ 63/ 291] blk.6.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
129
+ [ 64/ 291] blk.6.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
130
+ [ 65/ 291] blk.7.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
131
+ [ 66/ 291] blk.7.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
132
+ [ 67/ 291] blk.7.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
133
+ [ 68/ 291] blk.7.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
134
+ [ 69/ 291] blk.7.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
135
+ [ 70/ 291] blk.7.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
136
+ [ 71/ 291] blk.7.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
137
+ [ 72/ 291] blk.7.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
138
+ [ 73/ 291] blk.7.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
139
+ [ 74/ 291] blk.8.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
140
+ [ 75/ 291] blk.8.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
141
+ [ 76/ 291] blk.8.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
142
+ [ 77/ 291] blk.8.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
143
+ [ 78/ 291] blk.10.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
144
+ [ 79/ 291] blk.10.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
145
+ [ 80/ 291] blk.10.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
146
+ [ 81/ 291] blk.10.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
147
+ [ 82/ 291] blk.10.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
148
+ [ 83/ 291] blk.10.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
149
+ [ 84/ 291] blk.10.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
150
+ [ 85/ 291] blk.10.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
151
+ [ 86/ 291] blk.10.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
152
+ [ 87/ 291] blk.11.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
153
+ [ 88/ 291] blk.11.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
154
+ [ 89/ 291] blk.11.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
155
+ [ 90/ 291] blk.11.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
156
+ [ 91/ 291] blk.11.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
157
+ [ 92/ 291] blk.11.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
158
+ [ 93/ 291] blk.11.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
159
+ [ 94/ 291] blk.11.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
160
+ [ 95/ 291] blk.11.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
161
+ [ 96/ 291] blk.12.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
162
+ [ 97/ 291] blk.12.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
163
+ [ 98/ 291] blk.12.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
164
+ [ 99/ 291] blk.12.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
165
+ [ 100/ 291] blk.12.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
166
+ [ 101/ 291] blk.12.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
167
+ [ 102/ 291] blk.8.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
168
+ [ 103/ 291] blk.8.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
169
+ [ 104/ 291] blk.8.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
170
+ [ 105/ 291] blk.8.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
171
+ [ 106/ 291] blk.8.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
172
+ [ 107/ 291] blk.9.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
173
+ [ 108/ 291] blk.9.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
174
+ [ 109/ 291] blk.9.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
175
+ [ 110/ 291] blk.9.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
176
+ [ 111/ 291] blk.9.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
177
+ [ 112/ 291] blk.9.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
178
+ [ 113/ 291] blk.9.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
179
+ [ 114/ 291] blk.9.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
180
+ [ 115/ 291] blk.9.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
181
+ [ 116/ 291] blk.12.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
182
+ [ 117/ 291] blk.12.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
183
+ [ 118/ 291] blk.12.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
184
+ [ 119/ 291] blk.13.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
185
+ [ 120/ 291] blk.13.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
186
+ [ 121/ 291] blk.13.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
187
+ [ 122/ 291] blk.13.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
188
+ [ 123/ 291] blk.13.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
189
+ [ 124/ 291] blk.13.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
190
+ [ 125/ 291] blk.13.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
191
+ [ 126/ 291] blk.13.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
192
+ [ 127/ 291] blk.13.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
193
+ [ 128/ 291] blk.14.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
194
+ [ 129/ 291] blk.14.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
195
+ [ 130/ 291] blk.14.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
196
+ [ 131/ 291] blk.14.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
197
+ [ 132/ 291] blk.14.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
198
+ [ 133/ 291] blk.14.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
199
+ [ 134/ 291] blk.14.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
200
+ [ 135/ 291] blk.14.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
201
+ [ 136/ 291] blk.14.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
202
+ [ 137/ 291] blk.15.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
203
+ [ 138/ 291] blk.15.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
204
+ [ 139/ 291] blk.15.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
205
+ [ 140/ 291] blk.15.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
206
+ [ 141/ 291] blk.15.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
207
+ [ 142/ 291] blk.15.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
208
+ [ 143/ 291] blk.15.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
209
+ [ 144/ 291] blk.15.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
210
+ [ 145/ 291] blk.15.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
211
+ [ 146/ 291] blk.16.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
212
+ [ 147/ 291] blk.16.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
213
+ [ 148/ 291] blk.16.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
214
+ [ 149/ 291] blk.16.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
215
+ [ 150/ 291] blk.16.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
216
+ [ 151/ 291] blk.16.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
217
+ [ 152/ 291] blk.16.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
218
+ [ 153/ 291] blk.16.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
219
+ [ 154/ 291] blk.16.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
220
+ [ 155/ 291] blk.17.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
221
+ [ 156/ 291] blk.17.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
222
+ [ 157/ 291] blk.17.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
223
+ [ 158/ 291] blk.17.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
224
+ [ 159/ 291] blk.17.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
225
+ [ 160/ 291] blk.17.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
226
+ [ 161/ 291] blk.17.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
227
+ [ 162/ 291] blk.17.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
228
+ [ 163/ 291] blk.17.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
229
+ [ 164/ 291] blk.18.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
230
+ [ 165/ 291] blk.18.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
231
+ [ 166/ 291] blk.18.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
232
+ [ 167/ 291] blk.18.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
233
+ [ 168/ 291] blk.18.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
234
+ [ 169/ 291] blk.18.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
235
+ [ 170/ 291] blk.18.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
236
+ [ 171/ 291] blk.18.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
237
+ [ 172/ 291] blk.18.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
238
+ [ 173/ 291] blk.19.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
239
+ [ 174/ 291] blk.19.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
240
+ [ 175/ 291] blk.19.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
241
+ [ 176/ 291] blk.19.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
242
+ [ 177/ 291] blk.19.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
243
+ [ 178/ 291] blk.19.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
244
+ [ 179/ 291] blk.19.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
245
+ [ 180/ 291] blk.19.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
246
+ [ 181/ 291] blk.19.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
247
+ [ 182/ 291] blk.20.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
248
+ [ 183/ 291] blk.20.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
249
+ [ 184/ 291] blk.20.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
250
+ [ 185/ 291] blk.20.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
251
+ [ 186/ 291] blk.20.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
252
+ [ 187/ 291] blk.20.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
253
+ [ 188/ 291] blk.20.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
254
+ [ 189/ 291] blk.20.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
255
+ [ 190/ 291] blk.20.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
256
+ [ 191/ 291] blk.21.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
257
+ [ 192/ 291] blk.21.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
258
+ [ 193/ 291] blk.21.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
259
+ [ 194/ 291] blk.21.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
260
+ [ 195/ 291] blk.21.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
261
+ [ 196/ 291] blk.21.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
262
+ [ 197/ 291] blk.21.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
263
+ [ 198/ 291] blk.21.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
264
+ [ 199/ 291] blk.21.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
265
+ [ 200/ 291] blk.22.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
266
+ [ 201/ 291] blk.22.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
267
+ [ 202/ 291] blk.22.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
268
+ [ 203/ 291] blk.22.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
269
+ [ 204/ 291] blk.22.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
270
+ [ 205/ 291] blk.22.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
271
+ [ 206/ 291] blk.22.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
272
+ [ 207/ 291] blk.22.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
273
+ [ 208/ 291] blk.22.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
274
+ [ 209/ 291] blk.23.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
275
+ [ 210/ 291] blk.23.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
276
+ [ 211/ 291] blk.23.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
277
+ [ 212/ 291] blk.23.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
278
+ [ 213/ 291] blk.23.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
279
+ [ 214/ 291] blk.23.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
280
+ [ 215/ 291] blk.23.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
281
+ [ 216/ 291] blk.23.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
282
+ [ 217/ 291] blk.23.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
283
+ [ 218/ 291] blk.24.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
284
+ [ 219/ 291] blk.24.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
285
+ [ 220/ 291] blk.24.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
286
+ [ 221/ 291] blk.24.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
287
+ [ 222/ 291] blk.24.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
288
+ [ 223/ 291] blk.24.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
289
+ [ 224/ 291] blk.24.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
290
+ [ 225/ 291] blk.24.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
291
+ [ 226/ 291] blk.24.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
292
+ [ 227/ 291] blk.25.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
293
+ [ 228/ 291] blk.25.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
294
+ [ 229/ 291] blk.25.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
295
+ [ 230/ 291] blk.25.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
296
+ [ 231/ 291] blk.25.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
297
+ [ 232/ 291] blk.25.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
298
+ [ 233/ 291] blk.25.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
299
+ [ 234/ 291] blk.25.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
300
+ [ 235/ 291] blk.25.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
301
+ [ 236/ 291] blk.26.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
302
+ [ 237/ 291] blk.26.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
303
+ [ 238/ 291] blk.26.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
304
+ [ 239/ 291] blk.26.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
305
+ [ 240/ 291] blk.26.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
306
+ [ 241/ 291] blk.26.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
307
+ [ 242/ 291] blk.26.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
308
+ [ 243/ 291] blk.26.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
309
+ [ 244/ 291] blk.26.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
310
+ [ 245/ 291] blk.27.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
311
+ [ 246/ 291] blk.27.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
312
+ [ 247/ 291] blk.27.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
313
+ [ 248/ 291] blk.27.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
314
+ [ 249/ 291] blk.27.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
315
+ [ 250/ 291] blk.27.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
316
+ [ 251/ 291] blk.27.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
317
+ [ 252/ 291] blk.27.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
318
+ [ 253/ 291] blk.27.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
319
+ [ 254/ 291] blk.28.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
320
+ [ 255/ 291] blk.28.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
321
+ [ 256/ 291] blk.28.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
322
+ [ 257/ 291] blk.28.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
323
+ [ 258/ 291] blk.28.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
324
+ [ 259/ 291] blk.28.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
325
+ [ 260/ 291] blk.28.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
326
+ [ 261/ 291] blk.28.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
327
+ [ 262/ 291] blk.28.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
328
+ [ 263/ 291] blk.29.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
329
+ [ 264/ 291] blk.29.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
330
+ [ 265/ 291] blk.29.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
331
+ [ 266/ 291] blk.29.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
332
+ [ 267/ 291] blk.29.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
333
+ [ 268/ 291] blk.29.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
334
+ [ 269/ 291] blk.29.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
335
+ [ 270/ 291] blk.29.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
336
+ [ 271/ 291] blk.29.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
337
+ [ 272/ 291] blk.30.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
338
+ [ 273/ 291] blk.30.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
339
+ [ 274/ 291] blk.30.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
340
+ [ 275/ 291] blk.30.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
341
+ [ 276/ 291] blk.30.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
342
+ [ 277/ 291] blk.30.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
343
+ [ 278/ 291] output.weight - [ 4096, 32000, 1, 1], type = q8_0, quantizing to q6_K .. size = 132.81 MiB -> 102.54 MiB | hist:
344
+ [ 279/ 291] blk.30.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
345
+ [ 280/ 291] blk.30.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
346
+ [ 281/ 291] blk.30.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
347
+ [ 282/ 291] blk.31.attn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
348
+ [ 283/ 291] blk.31.ffn_down.weight - [14336, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 59.50 MiB -> 31.50 MiB | hist:
349
+ [ 284/ 291] blk.31.ffn_gate.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
350
+ [ 285/ 291] blk.31.ffn_up.weight - [ 4096, 14336, 1, 1], type = q8_0, quantizing to q3_K .. size = 59.50 MiB -> 24.06 MiB | hist:
351
+ [ 286/ 291] blk.31.ffn_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
352
+ [ 287/ 291] blk.31.attn_k.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q3_K .. size = 4.25 MiB -> 1.72 MiB | hist:
353
+ [ 288/ 291] blk.31.attn_output.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q4_K .. size = 17.00 MiB -> 9.00 MiB | hist:
354
+ [ 289/ 291] blk.31.attn_q.weight - [ 4096, 4096, 1, 1], type = q8_0, quantizing to q3_K .. size = 17.00 MiB -> 6.88 MiB | hist:
355
+ [ 290/ 291] blk.31.attn_v.weight - [ 4096, 1024, 1, 1], type = q8_0, quantizing to q4_K .. size = 4.25 MiB -> 2.25 MiB | hist:
356
+ [ 291/ 291] output_norm.weight - [ 4096, 1, 1, 1], type = f32, size = 0.016 MB
357
+ llama_model_quantize_internal: model size = 7338.64 MB
358
+ llama_model_quantize_internal: quant size = 3355.27 MB
359
+
360
+ main: quantize time = 165764.53 ms
361
+ main: total time = 165764.53 ms
362
+