NotImplementedError: Cannot copy out of meta tensor; no data!

#15
by Satya93 - opened

Hi, I can load this model using Oobabooga autogptq, but get the error in the title in a notebook. I'm using a rtx 4090 with 64gb of ram. Below is my command to load:

model = AutoGPTQForCausalLM.from_quantized(pretrained_model_dir, use_safetensors=True,
model_basename=model_basename,low_cpu_mem_usage=True,device_map="auto", use_triton=False)

Any clue what the problem may be?

EDIT: Ooobabooga actually didn't work with Auto GPTQ, I had gotten it working with Exllama-so some issue with this model, at least for me, getting it going with Auto GPTQ. For reference I am using the compiled build 0.3.0.dev0 from the cloned https://github.com/PanQiWei/AutoGPTQ repo.

Solved it! Set 150GB pagefile size AND removed low_cpu_mem_usage=True. The latter setting enabled me to load Tulu, avoiding the OOM error, but it doesn't work for Wizard Vicuna 30B.

That's very odd, but glad it's working for you now!

Thanks, it was a mystery! Crazy spike on ram usage on initialization!

I get this on EVERY auto gptq loaded model:

   The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.

Then on establishing pipeline:

 The model 'LlamaGPTQForCausalLM' is not supported for text-generation. Supported models are.....

And this, on safetensors:

The safetensors archive passed at E:\models\vicuna-33B-preview-GPTQ\vicuna-33b-preview-GPTQ-4bit--1g.act.order.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.

If these are not real issues, can the warning messages be turned off?

Thanks again!

Yes they can all be ignored.

The pipeline message comes from Hugging Face transformers and there's nothing AutoGPTQ can do about it. It can be blocked with this code:

from transformers import logging
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

The safetensors message comes from either transformers or safetensors library, not sure which. I don't know how to block that message or if it could be hidden by AutoGPTQ - possibly.

The model weights message comes from Accelerate, and I think could be prevented by AutoGPTQ.

You could raise this as an issue on the AutoGPTQ Github.

The transformers verbosity level command got rid of most of it, I'll look into AutoGPTQ logging, thanks!!

Sign up or log in to comment