ModuleNotFoundError: No module named ‘llama_inference_offload’ on Mac m1 chip

#8
by vijaysb - opened

Message in terminal,
INFO:Loading TheBloke_guanaco-65B-GPTQ...
ERROR:Failed to load GPTQ-for-LLaMa
ERROR:See https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md

I'm getting following error in WebUI:

Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 18, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/server.py”, line 71, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 97, in load_model output = load_func(model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 289, in GPTQ_loader import modules.GPTQ_loader File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 22, in sys.exit(-1) SystemExit: -1

any idea how to fix this?

Gptq is not supported on macos at this time.

Please use the ggml version, assuming you have 64+GB Ram. If not, please try a smaller model eg 33B GGML.

Thanks, I have exactly 64GB Ram, will it be slow?

additionally what configurations we need to fine tune it?

Yeah it'll be pretty slow. You might prefer to try a 30B model instead, like TheBloke/Guanaco-33B-GGML or, even better, TheBloke/WizardLM-30B-Uncensored-GGML

Yes tried TheBloke/Guanaco-33B-GGML, and it worked, a little slow ad initially takes around 30 sec to begin generating text.
Thanks for your support!!

vijaysb changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment