Beinsezii/Mythalion-13b-EXL2

Quantizations for PygmalionAI/mythalion-13b in the EXL2 format

Quant	VRAM estimate	Additional
4k_hb8_b8	18GB	Recommended!
4k_hb6_b6	15GB
4k_hb6_b5	13GB	Should fit in 12GB cards with 2k context

Breaking down the names:

4k is calibrated with 4096 context @ 82 rows (maximum for wikitext) as opposed to the default 2048 context @ 100 rows.
hb8 is a header depth of 8 bits
b8 is a model weight average of 8.0 bits

All quantizations were calibrated with wikitext-2

You can run a model calibrated at 2k with a 4k context or vice versa. The actual difference between 2k and 4k calibrations appears to be very small.

VRAM estimates are performed with an extremely long chatlog in oobabooga webui on a 7900 XTX using nvtop to monitor pytorch usage only, rounded up. Systems with lots of extra background processes may use more. Additionally, NVIDIA based systems with flash attention 2 will use less VRAM than otherwise estimated.

The measurement files are provided in the main branch so you can make your own quants at other bit depths without going through the 2-3 hours of measuring.