maddes8cht commited on
Commit
69e302d
1 Parent(s): aadb36a

"Update README.md"

Browse files
Files changed (1) hide show
  1. README.md +6 -4
README.md CHANGED
@@ -40,19 +40,21 @@ The core project making use of the ggml library is the [llama.cpp](https://githu
40
 
41
  # Quantization variants
42
 
43
- There is a bunch of quantized files available. How to choose the best for you:
44
 
45
  # Legacy quants
46
 
47
  Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
48
  Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
49
- Falcon 7B models cannot be quantized to K-quants.
 
 
50
 
51
  # K-quants
52
 
53
- K-quants are based on the idea that the quantization of certain parts affects the quality in different ways. If you quantize certain parts more and others less, you get a more powerful model with the same file size, or a smaller file size and lower memory load with comparable performance.
54
  So, if possible, use K-quants.
55
- With a Q6_K you should find it really hard to find a quality difference to the original model - ask your model two times the same question and you may encounter bigger quality differences.
56
 
57
 
58
 
 
40
 
41
  # Quantization variants
42
 
43
+ There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
44
 
45
  # Legacy quants
46
 
47
  Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
48
  Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
49
+ ## Note:
50
+ Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
51
+ (This mainly refers to Falcon 7b and Starcoder models)
52
 
53
  # K-quants
54
 
55
+ K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
56
  So, if possible, use K-quants.
57
+ With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.
58
 
59
 
60