get this error, running on mac book air m2

#14
by ezhuwork - opened

Traceback (most recent call last): File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/modules/GPTQ_loader.py”, line 17, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/server.py”, line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/modules/models.py”, line 74, in load_model output = load_func_maploader File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/modules/models.py”, line 278, in GPTQ_loader import modules.GPTQ_loader File “/Users/ezhu/AI/oobabooga_macos/text-generation-webui/modules/GPTQ_loader.py”, line 21, in sys.exit(-1) SystemExit: -1

ezhuwork changed discussion title from get this error, is it cus it needs dedicated GPU, and im running on mac air m2? to get this error, running on mac book air m2
ezhuwork changed discussion status to closed
ezhuwork changed discussion status to open

GPTQ models aren't properly supported on macOS. The one-click-installer won't install any GPTQ library, which is why you're getting this error. You could install it manually but there's no GPU acceleration so it will be really slow.

On macOS please use GGML instead. To get GPU acceleration you'll need to manually compile llama-cpp-python with Metal support https://github.com/abetlen/llama-cpp-python#installation-with-openblas--cublas--clblast--metal

Or much easier is to use LM Studio instead, which has full GPU acceleration on macOS and supports all GGML models: https://lmstudio.ai/

Sign up or log in to comment