Mozilla
/

Meta-Llama-3-70B-Instruct-llamafile

Text Generation

Model card Files Files and versions Community

jartine commited on Apr 20, 2024

Commit

923896f

·

verified ·

1 Parent(s): 34528ad

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -88,8 +88,9 @@ $8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4\_0.llamafile
 at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
 Just want to try it? You can go on vast.ai and rent a system with 4x RTX
-4090's for a few bucks an hour. That'll run these 70b llamafiles. Or you
-could build your own, but the graphics cards alone will cost $10k+.
 AMD Threadripper Pro 7995WX ($10k) does a good job too at 5.9 tok/sec
 eval with Q4\_0 (49 tok/sec prompt). With F16 weights the prompt eval

 at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
 Just want to try it? You can go on vast.ai and rent a system with 4x RTX
+4090's for a few bucks an hour. That'll run these 70b llamafiles. Be
+sure to pass the `-ngl 9999` flag. Or you could build your own, but the
+graphics cards alone will cost $10k+.
 AMD Threadripper Pro 7995WX ($10k) does a good job too at 5.9 tok/sec
 eval with Q4\_0 (49 tok/sec prompt). With F16 weights the prompt eval