Beinsezii/lzlv-L2-70B-EXL2

Quantizations for lizpreciatior/lzlv_70b_fp16_hf in the EXL2 format

Quant	24G ctx	wikitext	pippa1k	wtppnr-test	ARMS
h6_b2.5	8k	6.20	5.92	7.91	3.76
h6_b2.4	6k	6.33	6.03	8.92	3.84
h6_b2.4 (LoneStriker)	8k	5.87	6.40	8.50	4.12

Information

All quantizations were measured and quantized using the appropriate files. These custom sets are a 1/3 1/3 1/3 mixture of wikitext2, pippa, and no robots (herein "wtppnr") each with sizes adjusted to the default exl2 measurement and quantization lengths @ 4k

The quantization was done with default settings, 4096 rows and an NTK alpha of 2.5

The measurement files and parquets are provided in the main branch so you can make your own quants at other bit depths without going through 2-3 hours of measuring

Analysis

Continuing my personal research from my ReMM 13B quant I decided to try out the wtppnr dataset on a 70b to see how it generalizes and oh boy was it painful.

To start, that took like 3 or 4 hours for the measure and quant combined on a 7900 XTX. Not doing that again.

Second, the difference was less than I'd hoped for. While my own chatlogs in ARMS showed that the model may work 10% better for me personally, it fell off hard in the great Wikitext equalizer which concerns me. Granted, I'm not too sure what LoneStriker used to calibrate this one, so it could be some wikitext overfit perhaps.

The ARMS measure is promising at the least, as other exl2 quants of various models that are clearly overfit on pippa 100% of the time perform worse on ARMS, but this quant doesn't seem to suffer the same fate.

I would like to see it re-done with other calibration sets but I also don't want to microwave my house or rent compute. Today was stormy and this computer running was enough to heat the entire floor without the main heat pump running.

Conclusions...?

Looks neat but I probably have to actually talk to it for a bit to see how it does. It has the highest ARMS and Pippa benches of any non L2/13B model I've benchmarked so far.

I also feel there's a general undertraining of 70B models. 13B has been trained to the point of no return so it's easy to just test random model orgy merges until one happens to work well, but the same can't be said for 70B or even 34B. When 70B finally has its MythoMax moment maybe I'll try making some more quantizations.