Quantizations for Sao10K/Stheno-1.8-L2-13B in the EXL2 format
Breaking down the names:
- 4k is calibrated with 4096 length as opposed to the default 2048
- h8 is a header depth of 8 bits
- b8 is a model weight average of 8.0 bits
All quantizations were calibrated with wikitext-2 unless otherwise specified
MEM estimates are performed with an extremely long chatlog in oobabooga webui on a 7900 XTX using nvtop to monitor pytorch usage only. Systems with lots of extra background processes may use more. Additionally, NVIDIA based systems with flash attention 2 will use less VRAM than otherwise estimated.
The measurement files are provided in the main branch so you can make your own quants at other bit depths without going through the 2-3 hours of measuring.
Unable to determine this model's library. Check the
docs
.