I don't know why it won't fit into RTX 3090

#1
by DrNicefellow - opened

Because the vicuna-34B-GPTQ can fit into the card with exllama loader. This one reqruies more than 40GB GRAM. Much more emergent features?

You should limit the ctx len to shrink the pre-allocated kv cache used by exllama. The originial 4K ctx len Yi-34B 4bit gptq model could fit in 21GB vram

Don't worry, it doesn't fit even in 48 VRAM. :D

@Yhyu13 How could we limit ctx to reduce memory cost?

Sign up or log in to comment