Any way to speed up generation on a Windows 11 PC, using a single 24GB card (4090), with Text-Generation-WebUI

#2
by clevnumb - opened

Primary goal, SPEED, secondary, MORE CONTEXT

Right now I'm @ 4096 context , 8-bit cache, alpha and compress are both set to 1 (should I raise those??)

...and getting a very slow less than a half token/sec (0.24 & 0.35 with last two tests....unusable really)

Is anything available to speed up this generation even if I must stay at this context? Any other way? Thank you.

This model won't fit into 24 GB VRAM, so you are swapping to system RAM. There's no way to speed things up other than running a smaller model. Try a 2.4 bpw 70B model or a Mixtral 8x7B model at less than 4.0bpw.

This comment has been hidden

Sign up or log in to comment