Will it squeeze into 48?

by jackboot - opened

The model itself I think is 45.1g, will it squeeze into 48gb with the full 4096 if you have flash attention? I'm not planning on roping it but if it requires 64 or 56 for at least full ctx I'll grab the smaller one.

With FP8 KV cache and 8192 context size, it fit in A6000, taking 47gb out of 48gb.

thanks.. hope the overhead doesn't get me on 2x24 but i'm gonna queue it up to d/l overnight

It's a very tight fit. I have to use GPU split: 21.75,23 to load with fp16 cache and default 4k context. I have nothing else running on either 3090 card; running headless with no desktop or apps at all other than ooba. The model seems reasonably good even compared to other 70Bs even at 4.65bpw.

I just loaded it in tabby, auto split lets you squeeze more than in textgen, i'm not sure why. I haven't tried 8k but for 4k I have a 98/94 split. I think I can fit more context crazy enough. Initial impression of model is good.

Sign up or log in to comment