s1w commited on
Commit
73ad330
1 Parent(s): f21ea04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -1
README.md CHANGED
@@ -2,4 +2,21 @@
2
  license: apache-2.0
3
  ---
4
  ### GGML转换脚本详见:
5
- <https://github.com/LinkSoul-AI/Chinese-Llama-2-7b/tree/main/ggml>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
  ### GGML转换脚本详见:
5
+ <https://github.com/LinkSoul-AI/Chinese-Llama-2-7b/tree/main/ggml>
6
+
7
+ ## 量化配置的定义:
8
+ 转自: <https://www.reddit.com/r/LocalLLaMA/comments/139yt87/notable_differences_between_q4_2_and_q5_1/>
9
+
10
+ * q4_0 = 32 numbers in chunk, 4 bits per weight, 1 scale value at 32-bit float (5 bits per value in average), each weight is given by the common scale * quantized value.
11
+
12
+ * q4_1 = 32 numbers in chunk, 4 bits per weight, 1 scale value and 1 bias value at 32-bit float (6 bits per value in average), each weight is given by the common scale * quantized value + common bias.
13
+
14
+ * q4_2 = same as q4_0, but 16 numbers in chunk, 4 bits per weight, 1 scale value that is 16-bit float, same size as q4_0 but better because chunks are smaller.
15
+
16
+ * q4_3 = already dead, but analogous: q4_1 but 16 numbers in chunk, 4 bits per weight, scale value that is 16 bit and bias also 16 bits, same size as q4_1 but better because chunks are smaller.
17
+
18
+ * q5_0 = 32 numbers in chunk, 5 bits per weight, 1 scale value at 16-bit float, size is 5.5 bits per weight
19
+
20
+ * q5_1 = 32 numbers in a chunk, 5 bits per weight, 1 scale value at 16 bit float and 1 bias value at 16 bit, size is 6 bits per weight.
21
+
22
+ * q8_0 = same as q4_0, except 8 bits per weight, 1 scale value at 32 bits, making total of 9 bits per weight.