what quant should I use to use this with a single 24GB video card (PC) (4090 card)?
#2
by
clevnumb
- opened
Using 8-bit or 4-bit cache? Do I use ROPE scaling at all? 2.5? I'm using Text-Generation-Webui + SillyTavern on a PC with Windows 11
Thank you.
Hi! If you used GGUF instead of EXL2 you could split the model between VRAM and RAM check here