Extremly slow - any clue how to improve ?

by giovanith - opened Sep 14, 2023

Sep 14, 2023

I have a Ryzen 7, 40 Gb Ram and a RTX 4090 24 Gb. This model delivers not more than 0.6 tokens /s. (I'm using llama.cpp as loader, 40 n-gpu-layers, with 8 threads. The model is wizardcoder-python-34b-v1.0.Q4_K_M).
Any tip to improve this ?
thanks - Giovani, Brazil

mp3pintyo

Sep 14, 2023

I think you should check the Task Manager. Your video card is probably low on memory and is therefore using system memory. It's so slow though.
Try this:
wizardcoder-python-34b-v1.0.Q3_K_M.gguf

YaTharThShaRma999

Sep 18, 2023

First, you should probably try out exllama as it as faster for gpu(and you have a good gpu). Also, if you just want llama cpp, then you should install with cublas which helps a lot.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment