Absurdly slow on a I7 - 3060 - 64GB

#11
by webslug - opened

Ran this on a 3060 12 GB with 64GB of DDR 4 RAM.

It's incredibly slow and I was wondering if there were any settings I could adjust to remedy this?

webslug changed discussion title from Absurdly slow on a 3060 to Absurdly slow on a I7 - 3060 - 64GB

Ah, are you trying to run the full fp16 model? This is the unquantised repo, it's not really meant for basic inference. It'd take nearly 24gb of vram to run this one.

I'd run this instead with 12gb of vram:
https://huggingface.co/Sao10K/Fimbulvetr-11B-v2-GGUF -- GGUF (You can full offload on GPU or set max layers at q5_k_m, at 6k+ context... or go q6/q8 partially loading some of it in RAM)

https://huggingface.co/LoneStriker/Fimbulvetr-11B-v2-5.0bpw-h6-exl2 -- exl2 --> Fastest Speed, only pure GPU offloading

I'd take a look at koboldcpp for GGUF, it's literally an .exe and easy to run, or TabbyAPI for exl2

Thank you for your help!

Sign up or log in to comment