etemiz's picture
Update README.md
0fe9680 verified
|
raw
history blame
No virus
546 Bytes
metadata
license: llama3.1

Llama 3.1 405B Quants and llama.cpp versions that is used for quantization

  • IQ1_S: 86.8 GB b3459
  • IQ1_M: 95.1 GB b3459
  • IQ2_XXS: 109.0 GB b3459
  • IQ3_XXS: 157.7 GB b3484

Quantization from BF16 here: https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/

which is converted from Llama 3.1 405B: https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct

imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat

Lmk if you need bigger quants.