Commit
·
f97bb8a
1
Parent(s):
b3a4785
Update README.md
Browse files
README.md
CHANGED
@@ -33,16 +33,6 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
|
|
33 |
|
34 |
|
35 |
|
36 |
-
## Explanation of the new k-quant methods
|
37 |
-
<details>
|
38 |
-
<summary>Click to see details</summary>
|
39 |
-
|
40 |
-
The new methods available are:
|
41 |
-
* GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
|
42 |
-
|
43 |
-
</details>
|
44 |
-
<!-- compatibility_ggml end -->
|
45 |
-
|
46 |
## Provided files
|
47 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
48 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
|
|
33 |
|
34 |
|
35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
## Provided files
|
37 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
38 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|