jartine commited on
Commit
923896f
1 Parent(s): 34528ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -88,8 +88,9 @@ $8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4\_0.llamafile
88
  at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
89
 
90
  Just want to try it? You can go on vast.ai and rent a system with 4x RTX
91
- 4090's for a few bucks an hour. That'll run these 70b llamafiles. Or you
92
- could build your own, but the graphics cards alone will cost $10k+.
 
93
 
94
  AMD Threadripper Pro 7995WX ($10k) does a good job too at 5.9 tok/sec
95
  eval with Q4\_0 (49 tok/sec prompt). With F16 weights the prompt eval
 
88
  at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
89
 
90
  Just want to try it? You can go on vast.ai and rent a system with 4x RTX
91
+ 4090's for a few bucks an hour. That'll run these 70b llamafiles. Be
92
+ sure to pass the `-ngl 9999` flag. Or you could build your own, but the
93
+ graphics cards alone will cost $10k+.
94
 
95
  AMD Threadripper Pro 7995WX ($10k) does a good job too at 5.9 tok/sec
96
  eval with Q4\_0 (49 tok/sec prompt). With F16 weights the prompt eval