CPU Inference
#13
by
Ange09
- opened
Hello TheBloke,
Is there any way to perform inference on CPU with the model?
Thank you very much.
Technically yes you can run GPTQ on CPU but it's horribly slow.
If you want CPU only inference, use the GGML versions found in https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML