Mozilla
/

Mixtral-8x22B-Instruct-v0.1-llamafile

Model card Files Files and versions Community

jartine commited on Apr 25, 2024

Commit

b4d83da

·

verified ·

1 Parent(s): 5ad53be

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -114,7 +114,8 @@ speedups for llama.cpp's simplest quants: Q8\_0 and Q4\_0.
 This model is very large. Even at Q2 quantization, it's still well-over
 twice as large the highest tier NVIDIA gaming GPUs. llamafile supports
 splitting models over multiple GPUs (for NVIDIA only currently) if you
-have such a system.
 Mac Studio is a good option for running this model. An M2 Ultra desktop
 from Apple is affordable and has 128GB of unified RAM+VRAM. If you have

 This model is very large. Even at Q2 quantization, it's still well-over
 twice as large the highest tier NVIDIA gaming GPUs. llamafile supports
 splitting models over multiple GPUs (for NVIDIA only currently) if you
+have such a system. The best way to get one, if you don't, is to pay a
+few bucks an hour to rent a 4x RTX 4090 rig off vast.ai.
 Mac Studio is a good option for running this model. An M2 Ultra desktop
 from Apple is affordable and has 128GB of unified RAM+VRAM. If you have