TheBloke commited on
Commit
958b2ec
1 Parent(s): 5f968e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -10
README.md CHANGED
@@ -40,21 +40,17 @@ To use the increased context with KoboldCpp and (when supported) llama.cpp, simp
40
  <!-- compatibility_ggml start -->
41
  ## Compatibility
42
 
43
- These GGMLs will work with any llama.cpp-compatible GGML client that supports k-quants.
44
 
45
  However the increased context length won't work without specific support. See the note in the introduction for details on using increased context.
46
 
47
- ## Explanation of the new k-quant methods
48
 
49
- The new methods available are:
50
- * GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
51
- * GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
52
- * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
53
- * GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
54
- * GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
55
- * GGML_TYPE_Q8_K - "type-0" 8-bit quantization. Only used for quantizing intermediate results. The difference to the existing Q8_0 is that the block size is 256. All 2-6 bit dot products are implemented for this quantization type.
56
 
57
- Refer to the Provided Files table below to see what files use which methods, and how.
 
 
58
  <!-- compatibility_ggml end -->
59
 
60
  ## Provided files
 
40
  <!-- compatibility_ggml start -->
41
  ## Compatibility
42
 
43
+ These GGMLs will work with any GGML client.
44
 
45
  However the increased context length won't work without specific support. See the note in the introduction for details on using increased context.
46
 
47
+ ## k-quants not possible with this model
48
 
49
+ Because this model uses a vocab size of 32001, it is not possible to create the new k-quant format model files for it.
 
 
 
 
 
 
50
 
51
+ For more information, please see:
52
+ - https://github.com/ggerganov/llama.cpp/issues/1919
53
+
54
  <!-- compatibility_ggml end -->
55
 
56
  ## Provided files