etemiz
/

Llama-3.1-405B-Inst-GGUF

Model card Files Files and versions Community

etemiz commited on 18 days ago

Commit

0fe9680

•

1 Parent(s): 9baf902

Update README.md

Files changed (1) hide show

README.md +5 -7

README.md CHANGED Viewed

@@ -1,10 +1,11 @@
 ---
 license: llama3.1
 ---
-Llama 3.1 405B Quants
-- IQ1_S: 86.8 GB
-- IQ1_M: 95.1 GB
-- IQ2_XXS: 109.0 GB
 Quantization from BF16 here:
 https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/
@@ -12,9 +13,6 @@ https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/
 which is converted from Llama 3.1 405B:
 https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
-llama.cpp version b3459. There is ongoing work in llama.cpp to support this model. If you use context = 8192 there are some reports that say this model works fine. If not, you can also try changing the Frequency Base as described in: https://www.reddit.com/r/LocalLLaMA/comments/1ectacp/until_the_rope_scaling_is_fixed_in_gguf_for/
 imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat
 Lmk if you need bigger quants.

 ---
 license: llama3.1
 ---
+Llama 3.1 405B Quants and llama.cpp versions that is used for quantization
+- IQ1_S: 86.8 GB  b3459
+- IQ1_M: 95.1 GB  b3459
+- IQ2_XXS: 109.0 GB  b3459
+- IQ3_XXS: 157.7 GB  b3484
 Quantization from BF16 here:
 https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/
 which is converted from Llama 3.1 405B:
 https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct
 imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat
 Lmk if you need bigger quants.