|
--- |
|
license: llama3 |
|
tags: |
|
- llama |
|
- llama-3 |
|
- meta |
|
- facebook |
|
- gguf |
|
--- |
|
Directly converted and quantized into GGUF based on `llama.cpp` (release tag: b2843) from the 'Mata-Llama-3' repo from Meta on Hugging Face. |
|
|
|
Including the original LLaMA 3 models file cloning from the Meta HF repo. (https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) |
|
|
|
If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one! |
|
|
|
### How to use the `gguf-split` / Model sharding demo : https://github.com/ggerganov/llama.cpp/discussions/6404 |
|
|
|
## Perplexity table on LLaMA 3 70B |
|
|
|
Less perplexity is better. (credit to: [dranger003](https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2093892514)) |
|
|
|
| Quantization | Size (GiB) | Perplexity (wiki.test) | Delta (FP16)| |
|
|--------------|------------|------------------------|-------------| |
|
| IQ1_S | 14.29 | 9.8655 +/- 0.0625 | 248.51% | |
|
| IQ1_M | 15.60 | 8.5193 +/- 0.0530 | 201.94% | |
|
| IQ2_XXS | 17.79 | 6.6705 +/- 0.0405 | 135.64% | |
|
| IQ2_XS | 19.69 | 5.7486 +/- 0.0345 | 103.07% | |
|
| IQ2_S | 20.71 | 5.5215 +/- 0.0318 | 95.05% | |
|
| Q2_K_S | 22.79 | 5.4334 +/- 0.0325 | 91.94% | |
|
| IQ2_M | 22.46 | 4.8959 +/- 0.0276 | 72.35% | |
|
| Q2_K | 24.56 | 4.7763 +/- 0.0274 | 68.73% | |
|
| IQ3_XXS | 25.58 | 3.9671 +/- 0.0211 | 40.14% | |
|
| IQ3_XS | 27.29 | 3.7210 +/- 0.0191 | 31.45% | |
|
| Q3_K_S | 28.79 | 3.6502 +/- 0.0192 | 28.95% | |
|
| IQ3_S | 28.79 | 3.4698 +/- 0.0174 | 22.57% | |
|
| IQ3_M | 29.74 | 3.4402 +/- 0.0171 | 21.53% | |
|
| Q3_K_M | 31.91 | 3.3617 +/- 0.0172 | 18.75% | |
|
| Q3_K_L | 34.59 | 3.3016 +/- 0.0168 | 16.63% | |
|
| IQ4_XS | 35.30 | 3.0310 +/- 0.0149 | 7.07% | |
|
| IQ4_NL | 37.30 | 3.0261 +/- 0.0149 | 6.90% | |
|
| Q4_K_S | 37.58 | 3.0050 +/- 0.0148 | 6.15% | |
|
| Q4_K_M | 39.60 | 2.9674 +/- 0.0146 | 4.83% | |
|
| Q5_K_S | 45.32 | 2.8843 +/- 0.0141 | 1.89% | |
|
| Q5_K_M | 46.52 | 2.8656 +/- 0.0139 | 1.23% | |
|
| Q6_K | 53.91 | 2.8441 +/- 0.0138 | 0.47% | |
|
| Q8_0 | 69.83 | 2.8316 +/- 0.0138 | 0.03% | |
|
| F16 | 131.43 | 2.8308 +/- 0.0138 | 0.00% | |
|
|
|
Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes). |
|
|
|
## License |
|
|
|
See the License file for Meta Llama 3 [here](https://llama.meta.com/llama3/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama3/use-policy/) |
|
|