VRAM
#53
by
DataSoul
- opened
I totally agree this is a great model, but I'm wondering, why it requires significantly more VRAM when running compared to other models with similar parameter sizes. It's to the point where I can't use longer contexts on my setup. ( I am using the Q4 version of gguf )
this is because of no group query attention right?