Update README.md

71d4408 verified 5 months ago

3.09 kB

	---
	license: llama3
	tags:
	- llama
	- llama-3
	- meta
	- facebook
	- gguf
	---
	Directly converted and quantized into GGUF based on `llama.cpp` (release tag: b2843) from the 'Mata-Llama-3' repo from Meta on Hugging Face.

	Including the original LLaMA 3 models file cloning from the Meta HF repo. (https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)

	If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one!

	### How to use the `gguf-split` / Model sharding demo : https://github.com/ggerganov/llama.cpp/discussions/6404

	## Perplexity table on LLaMA 3 70B

	Less perplexity is better. (credit to: [dranger003](https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2093892514))

	\| Quantization \| Size (GiB) \| Perplexity (wiki.test) \| Delta (FP16)\|
	\|--------------\|------------\|------------------------\|-------------\|
	\| IQ1_S \| 14.29 \| 9.8655 +/- 0.0625 \| 248.51% \|
	\| IQ1_M \| 15.60 \| 8.5193 +/- 0.0530 \| 201.94% \|
	\| IQ2_XXS \| 17.79 \| 6.6705 +/- 0.0405 \| 135.64% \|
	\| IQ2_XS \| 19.69 \| 5.7486 +/- 0.0345 \| 103.07% \|
	\| IQ2_S \| 20.71 \| 5.5215 +/- 0.0318 \| 95.05% \|
	\| Q2_K_S \| 22.79 \| 5.4334 +/- 0.0325 \| 91.94% \|
	\| IQ2_M \| 22.46 \| 4.8959 +/- 0.0276 \| 72.35% \|
	\| Q2_K \| 24.56 \| 4.7763 +/- 0.0274 \| 68.73% \|
	\| IQ3_XXS \| 25.58 \| 3.9671 +/- 0.0211 \| 40.14% \|
	\| IQ3_XS \| 27.29 \| 3.7210 +/- 0.0191 \| 31.45% \|
	\| Q3_K_S \| 28.79 \| 3.6502 +/- 0.0192 \| 28.95% \|
	\| IQ3_S \| 28.79 \| 3.4698 +/- 0.0174 \| 22.57% \|
	\| IQ3_M \| 29.74 \| 3.4402 +/- 0.0171 \| 21.53% \|
	\| Q3_K_M \| 31.91 \| 3.3617 +/- 0.0172 \| 18.75% \|
	\| Q3_K_L \| 34.59 \| 3.3016 +/- 0.0168 \| 16.63% \|
	\| IQ4_XS \| 35.30 \| 3.0310 +/- 0.0149 \| 7.07% \|
	\| IQ4_NL \| 37.30 \| 3.0261 +/- 0.0149 \| 6.90% \|
	\| Q4_K_S \| 37.58 \| 3.0050 +/- 0.0148 \| 6.15% \|
	\| Q4_K_M \| 39.60 \| 2.9674 +/- 0.0146 \| 4.83% \|
	\| Q5_K_S \| 45.32 \| 2.8843 +/- 0.0141 \| 1.89% \|
	\| Q5_K_M \| 46.52 \| 2.8656 +/- 0.0139 \| 1.23% \|
	\| Q6_K \| 53.91 \| 2.8441 +/- 0.0138 \| 0.47% \|
	\| Q8_0 \| 69.83 \| 2.8316 +/- 0.0138 \| 0.03% \|
	\| F16 \| 131.43 \| 2.8308 +/- 0.0138 \| 0.00% \|

	Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes).

	## License

	See the License file for Meta Llama 3 [here](https://llama.meta.com/llama3/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama3/use-policy/)

	---
	license: llama3
	tags:
	- llama
	- llama-3
	- meta
	- facebook
	- gguf
	---
	Directly converted and quantized into GGUF based on `llama.cpp` (release tag: b2843) from the 'Mata-Llama-3' repo from Meta on Hugging Face.

	Including the original LLaMA 3 models file cloning from the Meta HF repo. (https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct)

	If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one!

	### How to use the `gguf-split` / Model sharding demo : https://github.com/ggerganov/llama.cpp/discussions/6404

	## Perplexity table on LLaMA 3 70B

	Less perplexity is better. (credit to: [dranger003](https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2093892514))

	\| Quantization \| Size (GiB) \| Perplexity (wiki.test) \| Delta (FP16)\|
	\|--------------\|------------\|------------------------\|-------------\|
	\| IQ1_S \| 14.29 \| 9.8655 +/- 0.0625 \| 248.51% \|
	\| IQ1_M \| 15.60 \| 8.5193 +/- 0.0530 \| 201.94% \|
	\| IQ2_XXS \| 17.79 \| 6.6705 +/- 0.0405 \| 135.64% \|
	\| IQ2_XS \| 19.69 \| 5.7486 +/- 0.0345 \| 103.07% \|
	\| IQ2_S \| 20.71 \| 5.5215 +/- 0.0318 \| 95.05% \|
	\| Q2_K_S \| 22.79 \| 5.4334 +/- 0.0325 \| 91.94% \|
	\| IQ2_M \| 22.46 \| 4.8959 +/- 0.0276 \| 72.35% \|
	\| Q2_K \| 24.56 \| 4.7763 +/- 0.0274 \| 68.73% \|
	\| IQ3_XXS \| 25.58 \| 3.9671 +/- 0.0211 \| 40.14% \|
	\| IQ3_XS \| 27.29 \| 3.7210 +/- 0.0191 \| 31.45% \|
	\| Q3_K_S \| 28.79 \| 3.6502 +/- 0.0192 \| 28.95% \|
	\| IQ3_S \| 28.79 \| 3.4698 +/- 0.0174 \| 22.57% \|
	\| IQ3_M \| 29.74 \| 3.4402 +/- 0.0171 \| 21.53% \|
	\| Q3_K_M \| 31.91 \| 3.3617 +/- 0.0172 \| 18.75% \|
	\| Q3_K_L \| 34.59 \| 3.3016 +/- 0.0168 \| 16.63% \|
	\| IQ4_XS \| 35.30 \| 3.0310 +/- 0.0149 \| 7.07% \|
	\| IQ4_NL \| 37.30 \| 3.0261 +/- 0.0149 \| 6.90% \|
	\| Q4_K_S \| 37.58 \| 3.0050 +/- 0.0148 \| 6.15% \|
	\| Q4_K_M \| 39.60 \| 2.9674 +/- 0.0146 \| 4.83% \|
	\| Q5_K_S \| 45.32 \| 2.8843 +/- 0.0141 \| 1.89% \|
	\| Q5_K_M \| 46.52 \| 2.8656 +/- 0.0139 \| 1.23% \|
	\| Q6_K \| 53.91 \| 2.8441 +/- 0.0138 \| 0.47% \|
	\| Q8_0 \| 69.83 \| 2.8316 +/- 0.0138 \| 0.03% \|
	\| F16 \| 131.43 \| 2.8308 +/- 0.0138 \| 0.00% \|

	Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes).

	## License

	See the License file for Meta Llama 3 [here](https://llama.meta.com/llama3/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama3/use-policy/)