Beinsezii/Stheno-1.8-L2-13B-EXL2

Quantizations for Sao10K/Stheno-1.8-L2-13B in the EXL2 format

Quant	Mem 4k	Mem 4k 8b
4k_h8_b8	17.2GB	15.7GB
4k_h6_b5	12.6GB	11.0GB

Breaking down the names:

4k is calibrated with 4096 length as opposed to the default 2048
h8 is a header depth of 8 bits
b8 is a model weight average of 8.0 bits

All quantizations were calibrated with wikitext-2 unless otherwise specified

MEM estimates are performed with an extremely long chatlog in oobabooga webui on a 7900 XTX using nvtop to monitor pytorch usage only. Systems with lots of extra background processes may use more. Additionally, NVIDIA based systems with flash attention 2 will use less VRAM than otherwise estimated.

The measurement files are provided in the main branch so you can make your own quants at other bit depths without going through the 2-3 hours of measuring.