Edit model card

Quantizations for Sao10K/Stheno-1.8-L2-13B in the EXL2 format

Quant Mem 4k Mem 4k 8b
4k_h8_b8 17.2GB 15.7GB
4k_h6_b5 12.6GB 11.0GB

Breaking down the names:

  • 4k is calibrated with 4096 length as opposed to the default 2048
  • h8 is a header depth of 8 bits
  • b8 is a model weight average of 8.0 bits

All quantizations were calibrated with wikitext-2 unless otherwise specified

MEM estimates are performed with an extremely long chatlog in oobabooga webui on a 7900 XTX using nvtop to monitor pytorch usage only. Systems with lots of extra background processes may use more. Additionally, NVIDIA based systems with flash attention 2 will use less VRAM than otherwise estimated.

The measurement files are provided in the main branch so you can make your own quants at other bit depths without going through the 2-3 hours of measuring.

Downloads last month

-

Downloads are not tracked for this model. How to track
Unable to determine this model's library. Check the docs .