maddes8cht
commited on
Commit
•
fd42b84
1
Parent(s):
5dc61a9
"Update README.md"
Browse files
README.md
CHANGED
@@ -35,19 +35,21 @@ The core project making use of the ggml library is the [llama.cpp](https://githu
|
|
35 |
|
36 |
# Quantization variants
|
37 |
|
38 |
-
There is a bunch of quantized files available.
|
39 |
|
40 |
# Legacy quants
|
41 |
|
42 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
43 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
44 |
-
|
|
|
|
|
45 |
|
46 |
# K-quants
|
47 |
|
48 |
-
K-quants are
|
49 |
So, if possible, use K-quants.
|
50 |
-
With a Q6_K you
|
51 |
|
52 |
|
53 |
|
|
|
35 |
|
36 |
# Quantization variants
|
37 |
|
38 |
+
There is a bunch of quantized files available to cater to your specific needs. Here's how to choose the best option for you:
|
39 |
|
40 |
# Legacy quants
|
41 |
|
42 |
Q4_0, Q4_1, Q5_0, Q5_1 and Q8 are `legacy` quantization types.
|
43 |
Nevertheless, they are fully supported, as there are several circumstances that cause certain model not to be compatible with the modern K-quants.
|
44 |
+
## Note:
|
45 |
+
Now there's a new option to use K-quants even for previously 'incompatible' models, although this involves some fallback solution that makes them not *real* K-quants. More details can be found in affected model descriptions.
|
46 |
+
(This mainly refers to Falcon 7b and Starcoder models)
|
47 |
|
48 |
# K-quants
|
49 |
|
50 |
+
K-quants are designed with the idea that different levels of quantization in specific parts of the model can optimize performance, file size, and memory load.
|
51 |
So, if possible, use K-quants.
|
52 |
+
With a Q6_K, you'll likely find it challenging to discern a quality difference from the original model - ask your model two times the same question and you may encounter bigger quality differences.
|
53 |
|
54 |
|
55 |
|