ModuleNotFoundError: No module named ‘llama_inference_offload’ when running in oobabooga

by pelatho - opened Jun 2, 2023

Jun 2, 2023

I'm trying to run this in oobabooga (via runpod). I put in the model and set wbits to 4 and save the settings and try to reload the model, but I get this error:

Traceback (most recent call last):
File “/workspace/text-generation-webui/server.py”, line 100, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name)
File “/workspace/text-generation-webui/modules/models.py”, line 125, in load_model
from modules.GPTQ_loader import load_quantized
File “/workspace/text-generation-webui/modules/GPTQ_loader.py”, line 14, in
import llama_inference_offload
ModuleNotFoundError: No module named ‘llama_inference_offload’

TheBloke

Owner Jun 2, 2023

Yeah the default Runpod text-gen-ui template doesn't support GPTQ

I have one that does! https://runpod.io/gsc?template=qk29nkmbfr&ref=eexqfacd

Please read its README

pelatho

Jun 2, 2023

@TheBloke Ah, I see. Thanks!

pelatho

Jun 3, 2023

@TheBloke Got it working! Thanks for making this model! It's really quite versatile! Can't WAIT for the 65B version!

pelatho

Jun 3, 2023

@TheBloke Sorry if this is asking a lot but the pod template seems to break when I stop the pod and start it again. It won't even open the pod ssh terminal. It seems to be crashing. Am I missing something?

TheBloke

Owner Jun 3, 2023

@TheBloke Sorry if this is asking a lot but the pod template seems to break when I stop the pod and start it again. It won't even open the pod ssh terminal. It seems to be crashing. Am I missing something?

Hmm that's not meant to happen. I recently updated the template so you could stop and start the pod and your models would still be there.

Did you set any template overrides at all?

What GPU type are you using?

pelatho

Jun 3, 2023

@TheBloke I was using the RTX 4090, 24 GB VRAM, 83 GB RAM 16 vCPU. I did not set any template overrides, no. I start it, follow your README to download the model and load it and it works. Then if I stop the pod and start it, the server won't start, not even SSH works. Just clicking 'start SSH terminal' seems to crash the pod.

TheBloke

Owner Jun 3, 2023

Oh! I think I know what that is. It is starting. It's just taking a long time.

So when text-gen starts it will auto load a model. and when the pod first launches, the model is not cached and it loads much slower. And I think that might be affecting the SSH as well

Can you launch it again, and keep trying to access the UI for at least 3-4 minutes. Let me know. Then if you can access the UI, try SSH again.

I'll then see if I can make that experience better, eg by disable model auto load maybe

pelatho

Jun 3, 2023

@TheBloke Hmmm. I think I waited for 10 minutes a few times? I'll try one more time.

TheBloke

Owner Jun 3, 2023

Oh OK. I'm testing it now

pelatho

Jun 3, 2023

Question, I'm supposed to enable AutoGPTQ, right? (If I try loadin the model without that, i get an error)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment