Any way to speed up generation on a Windows 11 PC, using a single 24GB card (4090), with Text-Generation-WebUI

by clevnumb - opened Mar 10, 2024

Discussion

clevnumb

Mar 10, 2024

•

edited Mar 10, 2024

Primary goal, SPEED, secondary, MORE CONTEXT

Right now I'm @ 4096 context , 8-bit cache, alpha and compress are both set to 1 (should I raise those??)

...and getting a very slow less than a half token/sec (0.24 & 0.35 with last two tests....unusable really)

Is anything available to speed up this generation even if I must stay at this context? Any other way? Thank you.

LoneStriker

Owner Mar 13, 2024

This model won't fit into 24 GB VRAM, so you are swapping to system RAM. There's no way to speed things up other than running a smaller model. Try a 2.4 bpw 70B model or a Mixtral 8x7B model at less than 4.0bpw.

mjzdev

Mar 13, 2024

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment