Quantized versions for mare mortals?
#1
by
LeaveNhA
- opened
I started the AWQ, maybe then GPTQ. Not sure if my 4 A100 is enough, but I am interested into doing a GGUF (it's already here: abacusai/TheProfessor-155b-gguf)
exl2 quant here, https://huggingface.co/ek826/TheProfessor-155b-2.4bpw-exl2
exl2 quant here, https://huggingface.co/ek826/TheProfessor-155b-2.4bpw-exl2
Any possible to get it under 48gb? Would love to get it into dual-3090
Sure will do a 2.2 or 2.0 bpw and test on a dual 24gb vram setup
2.21 bpw exl2 quants, exactly fits in a dual-4090 setup w/4k context, runs at 17.96 tokens/sec
https://huggingface.co/ek826/TheProfessor-155b-2.21bpw-exl2
Awesome thank you!!