Request for H6_3.0BPW or H6_3.5BPW quants

#1
by Acerolaorion57 - opened

Hi! Could you please add 3.0BPW and 3.5BPW quants for owners of 8GB VRAM cards (RTX
5060/4060/3060 8GB)? Currently the minimum 4.0BPW is too tight to fit alongside KV cache. Thank you!

Hey there!

Thanks for the suggestion, makes sense for 8GB cards.

I’ll add both 3.0bpw and 3.5bpw quants.

Not sure about the exact upload time yet due to some internet disruptions, but I’ll get them up as soon as I can.

I’ll let you know once they’re live!

Redid the whole thing, now they are live!

IlyaGusev_saiga-nemo-12b_EXL3

Sign up or log in to comment