nisten
/

meta-405b-instruct-cpu-optimized-gguf

Inference Endpoints

Model card Files Files and versions Community

nisten commited on Jul 24, 2024

Commit

cb9fa31

·

verified ·

1 Parent(s): 347517f

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ This repository contains CPU-optimized GGUF quantizations of the Meta-Llama-3.1-
 ## Available Quantizations
-1. Q4_0_48 (CPU Optimized): ~246 GB
 2. BF16: ~820 GB
 3. Q8_0: ~410 GB
 4. more coming...
@@ -100,4 +100,4 @@ The use of this model is subject to the [Llama 3.1 Community License](https://gi
 Special thanks to the Meta AI team for creating and releasing the Llama 3.1 model series.
-## Enjoy; more quants and perplexity benchmarks coming

 ## Available Quantizations
+1. Q4_0_4_8 (CPU FMA-Optimized): ~246 GB
 2. BF16: ~820 GB
 3. Q8_0: ~410 GB
 4. more coming...
 Special thanks to the Meta AI team for creating and releasing the Llama 3.1 model series.
+## Enjoy; more quants and perplexity benchmarks coming.