what quant should I use to use this with a single 24GB video card (PC) (4090 card)?

#2
by clevnumb - opened

Using 8-bit or 4-bit cache? Do I use ROPE scaling at all? 2.5? I'm using Text-Generation-Webui + SillyTavern on a PC with Windows 11

Thank you.

Hi! If you used GGUF instead of EXL2 you could split the model between VRAM and RAM check here

https://huggingface.co/mradermacher/magnum-72b-v1-i1-GGUF

Sign up or log in to comment