Maybe smaller somehow?

#1
by QuantumState745837 - opened

Are there any ways to reduce a model's size like from 120b to maybe 70b or even 34b? Only got 24GB VRAM here. :(

Just use a native 70B model like miqu 2.4bpw. There's no way for you to fit a 120B model in 24 GB VRAM. Here are the model sizes for another 120B. Even at the 2.4bpw level, you're at 34 GB for a 120B model. A 2.4 bpw 70B though will fit in 24 GB VRAM. You will have to drop your context length though from 32K for these miqu-based models down to something much smaller. Like 2-8K depending on what else you are running on your GPU (like a desktop).

120B sizes

$ du -sh /models2/models/miquella-120b-*
34G     /models2/models/miquella-120b-2.4bpw-h6-exl2
42G     /models2/models/miquella-120b-3.0bpw-h6-exl2
49G     /models2/models/miquella-120b-3.5bpw-h6-exl2
56G     /models2/models/miquella-120b-4.0bpw-h6-exl2
63G     /models2/models/miquella-120b-4.5bpw-h6-exl2
83G     /models2/models/miquella-120b-6.0bpw-h6-exl2
110G    /models2/models/miquella-120b-8.0bpw-h8-exl2

70B sizes

$ du -sh /models2/models/miqu-1-70b-sf*exl2
20G     /models2/models/miqu-1-70b-sf-2.4bpw-h6-exl2
22G     /models2/models/miqu-1-70b-sf-2.65bpw-h6-exl2
25G     /models2/models/miqu-1-70b-sf-3.0bpw-h6-exl2
29G     /models2/models/miqu-1-70b-sf-3.5bpw-h6-exl2
33G     /models2/models/miqu-1-70b-sf-4.0bpw-h6-exl2
35G     /models2/models/miqu-1-70b-sf-4.25bpw-h6-exl2
41G     /models2/models/miqu-1-70b-sf-5.0bpw-h6-exl2
49G     /models2/models/miqu-1-70b-sf-6.0bpw-h6-exl2
65G     /models2/models/miqu-1-70b-sf-8.0bpw-h8-exl2

Sign up or log in to comment