File size: 546 Bytes
0b480d2
 
 
0fe9680
 
 
 
 
0b480d2
8bd05ec
0b480d2
 
8bd05ec
0b480d2
 
8bd05ec
 
248c616
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
---
license: llama3.1
---
Llama 3.1 405B Quants and llama.cpp versions that is used for quantization
- IQ1_S: 86.8 GB  b3459
- IQ1_M: 95.1 GB  b3459
- IQ2_XXS: 109.0 GB  b3459
- IQ3_XXS: 157.7 GB  b3484

Quantization from BF16 here:
https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/

which is converted from Llama 3.1 405B:
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct

imatrix file https://huggingface.co/nisten/meta-405b-instruct-cpu-optimized-gguf/blob/main/405imatrix.dat

Lmk if you need bigger quants.