Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,6 @@ These are here for refence, comparison, and any future work.
|
|
13 |
The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
|
14 |
|
15 |
These three were most interesting because:
|
16 |
-
q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
17 |
-
q4-k-m: the widely accepted standard as "good enough", but in this case it does not fit on a 4090
|
18 |
-
q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
|
|
|
13 |
The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
|
14 |
|
15 |
These three were most interesting because:
|
16 |
+
- q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
|
17 |
+
- q4-k-m: the widely accepted standard as "good enough", but in this case it does not fit on a 4090
|
18 |
+
- q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"
|