4bit-32g vs 4-bit 128g ?

#5
by nucleardiffusion - opened

Sorry for the noob question, what are the pros and cons of each?

In my experience, and anecdotally based on what I’ve heard, groupsize 128 gives slightly better results (lower perplexity score on common benchmarks) but uses slightly more vram. No groupsize / groupsize 32 (I don’t know which this really is) uses a little less VRAM and scores a little higher (worse) on perplexity.

Normally I’d go for 128 and a little more VRAM use, but it just so happens that any llama 30b model will fit entirely in 24GB VRAM with full 2048 context at groupsize 32, while at groupsize 24 you will have to sacrifice about 400 tokens to ensure it fits in 24GB

Therefor, I currently run llama 30b no groupsize (or is it 32? I still don’t know), full context on a 3090 24GB.

I appriciate the reply, thank you

Sign up or log in to comment