gobean
/

Mixtral-8x7B-Instruct-v0.1-GGUF

Inference Endpoints

Model card Files Files and versions Community

gobean commited on Apr 19, 2024

Commit

b271dc3

·

verified ·

1 Parent(s): b2c06e8

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ These are here for reference, comparison, and any future work.
 The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
-These three were most interesting because:
 - q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
 - q4-0: for some reason, this is better quality than q4-k-m.
 - q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090

 The quality of the llamafiles generated from these freshly converted GGUFs were noticeably better than those generated from the other GGUFs on HF.
+quant file notes:
 - q3-k-m: can fit entirely on a 4090 (24GB VRAM), very fast inference
 - q4-0: for some reason, this is better quality than q4-k-m.
 - q4-k-m: the widely accepted standard as "good enough" and general favorite for most models, but in this case it does not fit on a 4090