ModuleNotFoundError: No module named ‘llama_inference_offload’ on Mac m1 chip
Message in terminal,
INFO:Loading TheBloke_guanaco-65B-GPTQ...
ERROR:Failed to load GPTQ-for-LLaMa
ERROR:See https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md
I'm getting following error in WebUI:
Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 18, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/server.py”, line 71, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 97, in load_model output = load_func(model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 289, in GPTQ_loader import modules.GPTQ_loader File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 22, in sys.exit(-1) SystemExit: -1
any idea how to fix this?
Gptq is not supported on macos at this time.
Please use the ggml version, assuming you have 64+GB Ram. If not, please try a smaller model eg 33B GGML.
Thanks, I have exactly 64GB Ram, will it be slow?
additionally what configurations we need to fine tune it?
Yeah it'll be pretty slow. You might prefer to try a 30B model instead, like TheBloke/Guanaco-33B-GGML or, even better, TheBloke/WizardLM-30B-Uncensored-GGML
Yes tried TheBloke/Guanaco-33B-GGML, and it worked, a little slow ad initially takes around 30 sec to begin generating text.
Thanks for your support!!