VRAM requirements

#2
by practical-dreamer - opened

Pancho, any way these 120B models would run on the dual 3090 4090 setups?

Even at 3bit there’s just no way right?…

Would llamacpp+ with gpu VRAM offload even be an option or would performance suck so hard not with it?

Just wonderin

-Generic Username

3bpw will need about 48gb if you run it with 8bit cache, it's just enough memory to work. Using it with GGUF is pretty slow so I wouldn't recommend that.

Hi there! Long time!

As @mpasila said, it will work with 48GB VRAM. It can work with full fp16 cache with 3k context, or 8bit cache with, maybe, 4096 context (I haven't done enough tests on 2x48 GB), but 3bpw was made with 48GB VRAM in my mind.

Panchovix changed discussion status to closed

Sign up or log in to comment