I don't know why it won't fit into RTX 3090

by DrNicefellow - opened Nov 13, 2023

Nov 13, 2023

Because the vicuna-34B-GPTQ can fit into the card with exllama loader. This one reqruies more than 40GB GRAM. Much more emergent features?

Yhyu13

Nov 14, 2023

You should limit the ctx len to shrink the pre-allocated kv cache used by exllama. The originial 4K ctx len Yi-34B 4bit gptq model could fit in 21GB vram

Beck777

Nov 28, 2023

Don't worry, it doesn't fit even in 48 VRAM. :D

liyucheng

Dec 8, 2023

@Yhyu13 How could we limit ctx to reduce memory cost?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment