Update README.md
Browse files
README.md
CHANGED
@@ -14,5 +14,5 @@ The quality of the llamafiles generated from these freshly converted GGUFs were
|
|
14 |
|
15 |
These three were most interesting because:
|
16 |
- q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
17 |
-
- q4-k-m: the widely accepted standard as "good enough", but in this case it does not fit on a 4090
|
18 |
- q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
|
|
|
14 |
|
15 |
These three were most interesting because:
|
16 |
- q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
17 |
+
- q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
|
18 |
- q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
|