(0.9 tokens/s) text generation

#3
by 0xrk - opened

I'm running codellama-7b.Q4_K_M.gguf on CPU with text-generation-webui without GPU. Is (0.9 tokens/s) normal for this configuration?
Config:
I5-13400
DDR5 16GB

You could get a better speed by setting n threads to the amount of cores you have( I believe 10 from a google search if your cpu)

There might be a problem with text generation web-ui with llama.cpp. Cz sometimes generate 6-8 token/s and sometime 0.9 /s. with same prompt

Sign up or log in to comment