Edit model card

Quantizations for PygmalionAI/mythalion-13b in the EXL2 format

Quant VRAM estimate Additional
4k_hb8_b8 18GB Recommended!
4k_hb6_b6 15GB
4k_hb6_b5 13GB Should fit in 12GB cards with 2k context

Breaking down the names:

  • 4k is calibrated with 4096 context @ 82 rows (maximum for wikitext) as opposed to the default 2048 context @ 100 rows.
  • hb8 is a header depth of 8 bits
  • b8 is a model weight average of 8.0 bits

All quantizations were calibrated with wikitext-2

You can run a model calibrated at 2k with a 4k context or vice versa. The actual difference between 2k and 4k calibrations appears to be very small.

VRAM estimates are performed with an extremely long chatlog in oobabooga webui on a 7900 XTX using nvtop to monitor pytorch usage only, rounded up. Systems with lots of extra background processes may use more. Additionally, NVIDIA based systems with flash attention 2 will use less VRAM than otherwise estimated.

The measurement files are provided in the main branch so you can make your own quants at other bit depths without going through the 2-3 hours of measuring.

Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .