Quantized versions for mare mortals?

#1
by LeaveNhA - opened

@TheBloke I know you are bussy, but this would be very nice.

Thanks.

I started the AWQ, maybe then GPTQ. Not sure if my 4 A100 is enough, but I am interested into doing a GGUF (it's already here: abacusai/TheProfessor-155b-gguf)

exl2 quant here, https://huggingface.co/ek826/TheProfessor-155b-2.4bpw-exl2

Any possible to get it under 48gb? Would love to get it into dual-3090

Sure will do a 2.2 or 2.0 bpw and test on a dual 24gb vram setup

2.21 bpw exl2 quants, exactly fits in a dual-4090 setup w/4k context, runs at 17.96 tokens/sec
https://huggingface.co/ek826/TheProfessor-155b-2.21bpw-exl2

Awesome thank you!!

Sign up or log in to comment