Maybe smaller somehow?
#1
by
QuantumState745837
- opened
Are there any ways to reduce a model's size like from 120b to maybe 70b or even 34b? Only got 24GB VRAM here. :(
Just use a native 70B model like miqu 2.4bpw. There's no way for you to fit a 120B model in 24 GB VRAM. Here are the model sizes for another 120B. Even at the 2.4bpw level, you're at 34 GB for a 120B model. A 2.4 bpw 70B though will fit in 24 GB VRAM. You will have to drop your context length though from 32K for these miqu-based models down to something much smaller. Like 2-8K depending on what else you are running on your GPU (like a desktop).
120B sizes
$ du -sh /models2/models/miquella-120b-*
34G /models2/models/miquella-120b-2.4bpw-h6-exl2
42G /models2/models/miquella-120b-3.0bpw-h6-exl2
49G /models2/models/miquella-120b-3.5bpw-h6-exl2
56G /models2/models/miquella-120b-4.0bpw-h6-exl2
63G /models2/models/miquella-120b-4.5bpw-h6-exl2
83G /models2/models/miquella-120b-6.0bpw-h6-exl2
110G /models2/models/miquella-120b-8.0bpw-h8-exl2
70B sizes
$ du -sh /models2/models/miqu-1-70b-sf*exl2
20G /models2/models/miqu-1-70b-sf-2.4bpw-h6-exl2
22G /models2/models/miqu-1-70b-sf-2.65bpw-h6-exl2
25G /models2/models/miqu-1-70b-sf-3.0bpw-h6-exl2
29G /models2/models/miqu-1-70b-sf-3.5bpw-h6-exl2
33G /models2/models/miqu-1-70b-sf-4.0bpw-h6-exl2
35G /models2/models/miqu-1-70b-sf-4.25bpw-h6-exl2
41G /models2/models/miqu-1-70b-sf-5.0bpw-h6-exl2
49G /models2/models/miqu-1-70b-sf-6.0bpw-h6-exl2
65G /models2/models/miqu-1-70b-sf-8.0bpw-h8-exl2