jartine commited on
Commit
24bcc3f
·
verified ·
1 Parent(s): 35800b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -13
README.md CHANGED
@@ -79,19 +79,14 @@ Note: BF16 is currently only supported on CPU.
79
 
80
  ## Hardware Choices (LLaMA3 70B Specific)
81
 
82
- Any Macbook with a Metal GPU and 32GB of RAM should in theory be able to
83
- run Meta-Llama-3-70B-Instruct.Q2\_K.llamafile reasonably well, provided
84
- you close all your browser tabs. At this lowliest of quantization
85
- levels, llama3 is still smart enough to solve math riddles, but you
86
- should expect more hallucinations than usual.
87
-
88
- If you want to run Q4\_0 you'll probably be able to squeeze it on a
89
- $3,999.00 Macbook Pro M3 Max w/ 48GB of RAM.
90
-
91
- If you want to run Q5\_K\_M or or Q8\_0 the best choice is probably Mac
92
- Studio. An Apple M2 Ultra w/ 24-core CPU, 60-core GPU, 128GB RAM (costs
93
- $8000 with the monitor) runs Meta-Llama-3-70B-Instruct.Q4\_0.llamafile
94
- at 14 tok/sec (prompt eval is 82 tok/sec) thanks to the Metal GPU.
95
 
96
  Just want to try it? You can go on vast.ai and rent a system with 4x RTX
97
  4090's for a few bucks an hour. That'll run these 70b llamafiles. Be
 
79
 
80
  ## Hardware Choices (LLaMA3 70B Specific)
81
 
82
+ Don't bother if you're using a Macbook M1 with 32GB of RAM. The Q2\_K
83
+ weights might work slowly if you run in CPU mode (pass `-ngl 0`) but
84
+ you're not going to have a good experience.
85
+
86
+ Mac Studio is recommended. An Apple M2 Ultra w/ 24-core CPU, 60-core
87
+ GPU, 128GB RAM (costs $8000 with the monitor) runs
88
+ Meta-Llama-3-70B-Instruct.Q4\_0.llamafile at 14 tok/sec (prompt eval is
89
+ 82 tok/sec) thanks to the Metal GPU.
 
 
 
 
 
90
 
91
  Just want to try it? You can go on vast.ai and rent a system with 4x RTX
92
  4090's for a few bucks an hour. That'll run these 70b llamafiles. Be