VRAM

#53
by DataSoul - opened

I totally agree this is a great model, but I'm wondering, why it requires significantly more VRAM when running compared to other models with similar parameter sizes. It's to the point where I can't use longer contexts on my setup. ( I am using the Q4 version of gguf )

Cohere For AI org

Hey @DataSoul

Thanks for the feedback. Could you tell me which models did you find consuming lesser VRAM in comparison ?

this is because of no group query attention right?

Sign up or log in to comment