TheBloke/guanaco-65B-GPTQ · ModuleNotFoundError: No module named ‘llama_inference

Jun 1, 2023

•

edited Jun 1, 2023

Message in terminal,
INFO:Loading TheBloke_guanaco-65B-GPTQ...
ERROR:Failed to load GPTQ-for-LLaMa
ERROR:See https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md

I'm getting following error in WebUI:

Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 18, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/server.py”, line 71, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 97, in load_model output = load_func(model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 289, in GPTQ_loader import modules.GPTQ_loader File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 22, in sys.exit(-1) SystemExit: -1

any idea how to fix this?

TheBloke

Owner Jun 1, 2023

Gptq is not supported on macos at this time.

Please use the ggml version, assuming you have 64+GB Ram. If not, please try a smaller model eg 33B GGML.

vijaysb

Jun 2, 2023

Thanks, I have exactly 64GB Ram, will it be slow?

additionally what configurations we need to fine tune it?

TheBloke

Owner Jun 5, 2023

Yeah it'll be pretty slow. You might prefer to try a 30B model instead, like TheBloke/Guanaco-33B-GGML or, even better, TheBloke/WizardLM-30B-Uncensored-GGML

vijaysb

Jun 5, 2023

•

edited Jun 5, 2023

Yes tried TheBloke/Guanaco-33B-GGML, and it worked, a little slow ad initially takes around 30 sec to begin generating text.
Thanks for your support!!

vijaysb changed discussion status to closed Jun 5, 2023

TheBloke
/

guanaco-65B-GPTQ

ModuleNotFoundError: No module named ‘llama_inference_offload’ on Mac m1 chip