nisten
/

meta-405b-instruct-cpu-optimized-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Jul 24

Commit

90a464a

•

1 Parent(s): daaec3a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-
 1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
 2. BF16: ~811 GB
 3. Q8_0: ~406 GB
-4. Q2-Q8mix ~ 165Gb
 ## Use Aria2 for parallelized downloads, links will download 9x faster

 1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
 2. BF16: ~811 GB
 3. Q8_0: ~406 GB
+4. Q2-Q8 (custom quant I wrote) ~ 165 GB
 ## Use Aria2 for parallelized downloads, links will download 9x faster