Exl quant request

#1
by Clevyby - opened

Looks Interesting, I'd like to review this when I'm free in the future, so I'd like a 4 bpw of this in advance. Not sure of the exact bpw range since I haven't tested or used this range so I might ask again.

Hi @Clevyby , thanks again for testing. 2 quants are ready: https://huggingface.co/TeeZee/NEBULA-23.8B-v1.0-bpw4.0-h6-exl2 and TeeZee/NEBULA-23.8B-v1.0-bpw6.0-h8-exl2.Surprisingly this model has more parameters than 20B model and yet uses less VRAM, so i believe more bits can be squeezed into 24(8bpw?), 15(5bpw?) or 12 GB(3.75bpw?) than common quants for 20B models have.

Hello, thanks for making this! I'd like to ask what is the context size for the model?

Owner

4096 tokens.

Clevyby changed discussion status to closed

@TeeZee Hello, I'd like a 4.45 bpw of this please. Turns out, 4.5 bpw almost fit in free colab at 8k.

Clevyby changed discussion status to open
Owner
Clevyby changed discussion status to closed

@TeeZee So, quant still didn't fit at 8k, now surely a 4.34 bpw would fit. That one please.

Clevyby changed discussion status to open
Owner
Clevyby changed discussion status to closed

Sign up or log in to comment