CalliopeDS-v2-L2-13B-exl2

Branches:

main: measurement.json calculated at 2048 token calibration rows on PIPPA
4.0bpw-h6: 4 decoder bits per weight, 6 head bits
- ideal for 12gb GPUs, or 16gb GPUs with NTK extended context or CFG
6.0bpw-h6: 6 decoder bits per weight, 6 head bits
- ideal for 16gb GPUs, or 24gb GPUs with NTK extended context or CFG
8bit-32g-h8: all tensors 8bit 32g, 8 head bits
- experimental quant, this is with exllamav2 monkeypatched to quantize all tensors to 8bit 32g
- similar in size to old GPTQ 8bit no groupsize, recommend 24gb GPU