jartine commited on
Commit
c5447f0
1 Parent(s): 3183783

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -15
README.md CHANGED
@@ -70,26 +70,25 @@ Note: BF16 is currently only supported on CPU.
70
  ## Hardware Choices
71
 
72
  Any Macbook with 32GB should be able to run
73
- Meta-Llama-3-70B-Instruct.Q2\_K.llamafile which I uploaded a few minutes
74
- ago. It's smart enough to solve math riddles, but at this level of
75
- quantization you should expect hallucinations.
76
 
77
  If you want to run Q4\_0 you'll probably be able to squeeze it on a
78
  $3,999.00 Macbook Pro M3 Max w/ 48GB of RAM.
79
 
80
  If you want to run Q5\_K\_M or or Q8\_0 the best choice is probably Mac
81
- Studio. I have an Apple M2 Ultra w/ 24-core CPU, 60-core GPU, 128GB RAM.
82
- It cost me $8000 with the monitor. If I run
83
- Meta-Llama-3-70B-Instruct.Q4\_0.llamafile then I get 14 tok/sec (prompt
84
- eval is 82 tok/sec) thanks to the Metal GPU.
85
-
86
- You could alternatively go on vast.ai and rent a system with 4x RTX
87
- 4090's for a few bucks an hour. That'll run 70b. Or you could build your
88
- own, but the graphics cards alone will cost $10k+.
89
-
90
- AMD Threadripper Pro 7995WX ($10k) does a good job too. I get 5.9
91
- tok/sec eval with Q4\_0 and 49 tok/sec prompt. If I use F16 weights then
92
- prompt eval goes 65 tok/sec.
93
 
94
  ---
95
 
 
70
  ## Hardware Choices
71
 
72
  Any Macbook with 32GB should be able to run
73
+ Meta-Llama-3-70B-Instruct.Q2\_K.llamafile. It's smart enough to solve
74
+ math riddles, but at this level of quantization you should expect
75
+ hallucinations.
76
 
77
  If you want to run Q4\_0 you'll probably be able to squeeze it on a
78
  $3,999.00 Macbook Pro M3 Max w/ 48GB of RAM.
79
 
80
  If you want to run Q5\_K\_M or or Q8\_0 the best choice is probably Mac
81
+ Studio. An Apple M2 Ultra w/ 24-core CPU, 60-core GPU, 128GB RAM (costs
82
+ $8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4\_0.llamafile
83
+ at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
84
+
85
+ Just want to try it? You can go on vast.ai and rent a system with 4x RTX
86
+ 4090's for a few bucks an hour. That'll run these 70b llamafiles. Or you
87
+ could build your own, but the graphics cards alone will cost $10k+.
88
+
89
+ AMD Threadripper Pro 7995WX ($10k) does a good job too at 5.9 tok/sec
90
+ eval with Q4\_0 (49 tok/sec prompt). With F16 weights the prompt eval
91
+ goes 65 tok/sec.
 
92
 
93
  ---
94