Using CPU only

#1
by BBLL3456 - opened

Hi @TheBloke , I have used this model with Ooba and using A6000 GPU, but this model runs solely on CPU and utilised 0% of GPU. I have tried setting to "auto device" but result is the same.

By default GGML does only use the CPU. To enable GPU offloading, add the -ngl X argument, where X is the number of layers to offload to GPU.

With an A6000 you should have enough VRAM to offload all the layers in a 30B/33B model, so enter -ngl 60.

You should see much better performance!

I should mention this in the README and will do soon.

I have a weird question or concern

By default GGML does only use the CPU. To enable GPU offloading, add the -ngl X argument, where X is the number of layers to offload to GPU.

With an A6000 you should have enough VRAM to offload all the layers in a 30B/33B model, so enter -ngl 60.

You should see much better performance!

I should mention this in the README and will do soon.

I have a question or concern.
I'n Ooba I can only get GGML models to run if I don't use arguments. Like I run them with the default loader choosing a number of which model to load. And I just settings. Do I still have to use -ngl X? Or how do I enable that in settings. Keep in mind I already followed the steps to make GPU acceleration to work.
https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md

However it still does not work. I know other people having that issue too. But if it's something simple like I need to use -ngl X I need to know.

No there's no such thing as -ngl in text-gen-ui. In text-gen you use the layers slider in the UI.

I answered in your other discussion regarding the issues getting llama-cpp-python compiled with GPU support.

Sign up or log in to comment