Update README.md
Browse files
README.md
CHANGED
@@ -114,7 +114,8 @@ speedups for llama.cpp's simplest quants: Q8\_0 and Q4\_0.
|
|
114 |
This model is very large. Even at Q2 quantization, it's still well-over
|
115 |
twice as large the highest tier NVIDIA gaming GPUs. llamafile supports
|
116 |
splitting models over multiple GPUs (for NVIDIA only currently) if you
|
117 |
-
have such a system.
|
|
|
118 |
|
119 |
Mac Studio is a good option for running this model. An M2 Ultra desktop
|
120 |
from Apple is affordable and has 128GB of unified RAM+VRAM. If you have
|
|
|
114 |
This model is very large. Even at Q2 quantization, it's still well-over
|
115 |
twice as large the highest tier NVIDIA gaming GPUs. llamafile supports
|
116 |
splitting models over multiple GPUs (for NVIDIA only currently) if you
|
117 |
+
have such a system. The best way to get one, if you don't, is to pay a
|
118 |
+
few bucks an hour to rent a 4x RTX 4090 rig off vast.ai.
|
119 |
|
120 |
Mac Studio is a good option for running this model. An M2 Ultra desktop
|
121 |
from Apple is affordable and has 128GB of unified RAM+VRAM. If you have
|