etemiz commited on
Commit
0fe9680
1 Parent(s): 9baf902

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -7
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
2
  license: llama3.1
3
  ---
4
- Llama 3.1 405B Quants
5
- - IQ1_S: 86.8 GB
6
- - IQ1_M: 95.1 GB
7
- - IQ2_XXS: 109.0 GB
 
8
 
9
  Quantization from BF16 here:
10
  https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/
@@ -12,9 +13,6 @@ https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/
12
  which is converted from Llama 3.1 405B:
13
  https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
14
 
15
-
16
- llama.cpp version b3459. There is ongoing work in llama.cpp to support this model. If you use context = 8192 there are some reports that say this model works fine. If not, you can also try changing the Frequency Base as described in: https://www.reddit.com/r/LocalLLaMA/comments/1ectacp/until_the_rope_scaling_is_fixed_in_gguf_for/
17
-
18
  imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat
19
 
20
  Lmk if you need bigger quants.
 
1
  ---
2
  license: llama3.1
3
  ---
4
+ Llama 3.1 405B Quants and llama.cpp versions that is used for quantization
5
+ - IQ1_S: 86.8 GB b3459
6
+ - IQ1_M: 95.1 GB b3459
7
+ - IQ2_XXS: 109.0 GB b3459
8
+ - IQ3_XXS: 157.7 GB b3484
9
 
10
  Quantization from BF16 here:
11
  https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/
 
13
  which is converted from Llama 3.1 405B:
14
  https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
15
 
 
 
 
16
  imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat
17
 
18
  Lmk if you need bigger quants.