fit in 24gb?

#1
by apol - opened

Any plans to release a 2b or 1b version that is functional and can fit into a 4090?
Congrats on the release and bravo!

A 4090 has 24GB of VRAM. Assuming want to offload the whole model to GPU, you can fit an IQ2_XS GGUF or a 2.24bpw EXL2 of this model into your 4090 with the 8K native context of Llama 3.

If you only want to run full precision models, you can easily fit a Llama 3 8B model into your VRAM. No need to go all the way down to 2B.

thanks!

apol changed discussion status to closed

Sign up or log in to comment