TeeZee
/

NEBULA-23.8B-v1.0

Text Generation

Not-For-All-Audiences

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Exl quant request

#1

by Clevyby - opened Apr 21

Clevyby

Apr 21

•

Looks Interesting, I'd like to review this when I'm free in the future, so I'd like a 4 bpw of this in advance. Not sure of the exact bpw range since I haven't tested or used this range so I might ask again.

TeeZee

Owner Apr 21

•

Hi @Clevyby , thanks again for testing. 2 quants are ready: https://huggingface.co/TeeZee/NEBULA-23.8B-v1.0-bpw4.0-h6-exl2 and TeeZee/NEBULA-23.8B-v1.0-bpw6.0-h8-exl2.Surprisingly this model has more parameters than 20B model and yet uses less VRAM, so i believe more bits can be squeezed into 24(8bpw?), 15(5bpw?) or 12 GB(3.75bpw?) than common quants for 20B models have.

Apr 21

Hello, thanks for making this! I'd like to ask what is the context size for the model?

TeeZee

Owner Apr 21

4096 tokens.

Clevyby changed discussion status to closed Apr 22

Clevyby

May 9

•

@TeeZee Hello, I'd like a 4.45 bpw of this please. Turns out, 4.5 bpw almost fit in free colab at 8k.

Clevyby changed discussion status to open May 9

TeeZee

Owner May 10

Hi, here it is: https://huggingface.co/TeeZee/NEBULA-23.8B-v1.0-bpw4.45-h8-exl2

Clevyby changed discussion status to closed May 10

Clevyby

May 11

•

@TeeZee So, quant still didn't fit at 8k, now surely a 4.34 bpw would fit. That one please.

Clevyby changed discussion status to open May 11

TeeZee

Owner May 12

Np, here you go: TeeZee/NEBULA-23.8B-v1.0-bpw4.34-h8-exl2

Clevyby changed discussion status to closed May 12

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment