Extremly slow - any clue how to improve ?

#4
by giovanith - opened

I have a Ryzen 7, 40 Gb Ram and a RTX 4090 24 Gb. This model delivers not more than 0.6 tokens /s. (I'm using llama.cpp as loader, 40 n-gpu-layers, with 8 threads. The model is wizardcoder-python-34b-v1.0.Q4_K_M).
Any tip to improve this ?
thanks - Giovani, Brazil

I think you should check the Task Manager. Your video card is probably low on memory and is therefore using system memory. It's so slow though.
Try this:
wizardcoder-python-34b-v1.0.Q3_K_M.gguf

First, you should probably try out exllama as it as faster for gpu(and you have a good gpu). Also, if you just want llama cpp, then you should install with cublas which helps a lot.

Sign up or log in to comment