Update README.md
Browse files
README.md
CHANGED
@@ -88,8 +88,9 @@ $8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4\_0.llamafile
|
|
88 |
at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
|
89 |
|
90 |
Just want to try it? You can go on vast.ai and rent a system with 4x RTX
|
91 |
-
4090's for a few bucks an hour. That'll run these 70b llamafiles.
|
92 |
-
could build your own, but the
|
|
|
93 |
|
94 |
AMD Threadripper Pro 7995WX ($10k) does a good job too at 5.9 tok/sec
|
95 |
eval with Q4\_0 (49 tok/sec prompt). With F16 weights the prompt eval
|
|
|
88 |
at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
|
89 |
|
90 |
Just want to try it? You can go on vast.ai and rent a system with 4x RTX
|
91 |
+
4090's for a few bucks an hour. That'll run these 70b llamafiles. Be
|
92 |
+
sure to pass the `-ngl 9999` flag. Or you could build your own, but the
|
93 |
+
graphics cards alone will cost $10k+.
|
94 |
|
95 |
AMD Threadripper Pro 7995WX ($10k) does a good job too at 5.9 tok/sec
|
96 |
eval with Q4\_0 (49 tok/sec prompt). With F16 weights the prompt eval
|