Directly converted and quantized into GGUF based on llama.cpp (release tag: b2843) from the 'Mata-Llama-3' repo from Meta on Hugging Face.

Including the original LLaMA 3 models file cloning from the Meta HF repo. (https://huggingface.co/meta-llama/Meta-Llama-3-8B)

If you have issues downloading the models from Meta or converting models for llama.cpp, feel free to download this one!

Perplexity table on LLaMA 3 70B

Less perplexity is better. (credit to: dranger003)

Quantization Size (GiB) Perplexity (wiki.test) Delta (FP16)
IQ1_S 14.29 9.8655 +/- 0.0625 248.51%
IQ1_M 15.60 8.5193 +/- 0.0530 201.94%
IQ2_XXS 17.79 6.6705 +/- 0.0405 135.64%
IQ2_XS 19.69 5.7486 +/- 0.0345 103.07%
IQ2_S 20.71 5.5215 +/- 0.0318 95.05%
Q2_K_S 22.79 5.4334 +/- 0.0325 91.94%
IQ2_M 22.46 4.8959 +/- 0.0276 72.35%
Q2_K 24.56 4.7763 +/- 0.0274 68.73%
IQ3_XXS 25.58 3.9671 +/- 0.0211 40.14%
IQ3_XS 27.29 3.7210 +/- 0.0191 31.45%
Q3_K_S 28.79 3.6502 +/- 0.0192 28.95%
IQ3_S 28.79 3.4698 +/- 0.0174 22.57%
IQ3_M 29.74 3.4402 +/- 0.0171 21.53%
Q3_K_M 31.91 3.3617 +/- 0.0172 18.75%
Q3_K_L 34.59 3.3016 +/- 0.0168 16.63%
IQ4_XS 35.30 3.0310 +/- 0.0149 7.07%
IQ4_NL 37.30 3.0261 +/- 0.0149 6.90%
Q4_K_S 37.58 3.0050 +/- 0.0148 6.15%
Q4_K_M 39.60 2.9674 +/- 0.0146 4.83%
Q5_K_S 45.32 2.8843 +/- 0.0141 1.89%
Q5_K_M 46.52 2.8656 +/- 0.0139 1.23%
Q6_K 53.91 2.8441 +/- 0.0138 0.47%
Q8_0 69.83 2.8316 +/- 0.0138 0.03%
F16 131.43 2.8308 +/- 0.0138 0.00%

Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model README. For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go here.

License

See the License file for Meta Llama 3 here and Acceptable Use Policy here

Downloads last month
271
GGUF
Model size
8.03B params
Architecture
llama

3-bit

4-bit

5-bit

16-bit

Inference API
Unable to determine this model's library. Check the docs .