Can you make a 2.4bpw quantization?

#1
by xldistance - opened

2.65bpw quantization set max_position_embeddings to 10000, occupy more than 25GB of video memory, 4090 graphics card with very bad

I can add a 2.4bpw quant, you may need to adjust max tokens if it doesn't fit.

Sign up or log in to comment