Nexesenex
/

Meta_Llama-3.1-8b-it_iMat_Custom_Quant_Stategies-GGUF

Inference Endpoints

Model card Files Files and versions Community

Meta_Llama-3.1-8b-it_iMat_Custom_Quant_Stategies-GGUF / README.md

Nexesenex's picture

Update README.md

2e13fc3 verified 3 months ago

|

2.19 kB

	---
	license: llama3.1
	---

	Experimental .GGUF quants for https://huggingface.co/google/gemma-2-9b-it accordingly to LCPP PR
	(based on b_3529 and now b_3565 for the newer ones) : https://github.com/ggerganov/llama.cpp/pull/8836

	These experimental quant strategies revisiting Ikawrakow's work are displaying a slight decrease of perplexity,
	including per bpw (from 10%+ for the lowest quants to 0.x% for the highest ones).
	This is significant enough to encourage you folks to test them, and provide feedback if pertinent.

	The iMatrix I use is based on Group Merged V3 and enriched with a bit of French,
	a bit of Serbian, and a bit of Croatian languages.


	ARC and PPL-512 DATA (Get the last data on the main post of the PR thread) :

	```

	IQ3_XXS

	Master
	Size : 3.04 GiB (3.25 BPW)
	PPL 512 wikitext : 8.4985 +/- 0.05402

	PR (so so)
	Size : 3.11 GiB (3.32 BPW)
	PPL 512 wikitext : 8.3274 +/- 0.05334

	IQ3_XS

	Master
	Size : 3.27 GiB (3.50 BPW)
	PPL 512 wikitext : 8.2019 +/- 0.05167

	PR (ok)
	Size : 3.24 GiB (3.47 BPW)
	PPL 512 wikitext : 8.1762 +/- 0.05176

	IQ3_S

	Master
	Size : 3.42 GiB (3.66 BPW)
	PPL 512 wikitext : 7.9894 +/- 0.05020

	PR (good)
	Size : 3.41 GiB (3.64 BPW)
	PPL 512 wikitext : 7.9067 +/- 0.05022

	IQ3_M

	Master
	Size : 3.52 GiB (3.76 BPW)
	PPL 512 wikitext : 7.9263 +/- 0.04943

	PR (good)
	Size : 3.49 GiB (3.73 BPW)
	PPL 512 wikitext : 7.8704 +/- 0.04951

	IQ3_XL

	PR (good)
	Size : 3.71 GiB (3.97 BPW)
	PPL 512 wikitext : 7.7225 +/- 0.04946

	IQ3_XXL

	PR (good, the benefit seems meager but the token embeddings pushed form IQ3_S to IQ4_XS explains +0.05BPW of it,
	and this tensor doesn't run in VRAM but in RAM)
	Size : 3.83 GiB (4.09 BPW)
	PPL 512 wikitext : 7.6720 +/- 0.04892

	IQ3_XXL

	PR (good)
	Size : 3.97 GiB (4.24 BPW)
	PPL 512 wikitext : 7.5920 +/- 0.04839

	IQ4_XS

	Master
	Size : 4.13 GiB (4.42 BPW)
	Arc-C 299 49.16387960
	Arc-E 570 72.10526316
	PPL 512 wikitext : 7.5226 +/- 0.04820

	IQ4_XSR

	PR (good)
	Size : 4.16 GiB (4.45 BPW)
	Arc-C 299
	Arc-E 570
	PPL 512 wikitext : 7.5072 +/- 0.04814

	FP16

	MASTER : Gemma 2 9b It F16.
	Size : 14.96 GiB (16.00 BPW)
	Arc-C 299 49.49832776
	Arc-E 570 73.85964912
	PPL 512 wikitext : 7.3224 +/- 0.04674

	```