gobean
/

Mixtral-8x7B-Instruct-v0.1-GGUF

Inference Endpoints

Model card Files Files and versions Community

gobean commited on Apr 3

Commit

24279c5

•

1 Parent(s): 8bda1e3

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -14,5 +14,5 @@ The quality of the llamafiles generated from these freshly converted GGUFs were
 These three were most interesting because:
 - q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
-- q4-k-m: the widely accepted standard as "good enough", but in this case it does not fit on a 4090
 - q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"

 These three were most interesting because:
 - q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
+- q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090
 - q5-k-m: my favorite for smaller models, larger - provides a reference for "what if you have more than just a bit that won't fit on the gpu"