NotImplementedError: Cannot copy out of meta tensor; no data!

#15

by Satya93 - opened Jun 25, 2023

Jun 25, 2023

Hi, I can load this model using Oobabooga autogptq, but get the error in the title in a notebook. I'm using a rtx 4090 with 64gb of ram. Below is my command to load:

model = AutoGPTQForCausalLM.from_quantized(pretrained_model_dir, use_safetensors=True,
model_basename=model_basename,low_cpu_mem_usage=True,device_map="auto", use_triton=False)

Any clue what the problem may be?

Satya93

Jun 25, 2023

EDIT: Ooobabooga actually didn't work with Auto GPTQ, I had gotten it working with Exllama-so some issue with this model, at least for me, getting it going with Auto GPTQ. For reference I am using the compiled build 0.3.0.dev0 from the cloned https://github.com/PanQiWei/AutoGPTQ repo.

Satya93

Jun 25, 2023

Solved it! Set 150GB pagefile size AND removed low_cpu_mem_usage=True. The latter setting enabled me to load Tulu, avoiding the OOM error, but it doesn't work for Wizard Vicuna 30B.

TheBloke

Owner Jun 25, 2023

That's very odd, but glad it's working for you now!

Satya93

Jun 25, 2023

Thanks, it was a mystery! Crazy spike on ram usage on initialization!

Satya93

Jun 25, 2023

I get this on EVERY auto gptq loaded model:

   The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.

Then on establishing pipeline:

 The model 'LlamaGPTQForCausalLM' is not supported for text-generation. Supported models are.....

And this, on safetensors:

The safetensors archive passed at E:\models\vicuna-33B-preview-GPTQ\vicuna-33b-preview-GPTQ-4bit--1g.act.order.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.

If these are not real issues, can the warning messages be turned off?

Thanks again!

TheBloke

Owner Jun 25, 2023

Yes they can all be ignored.

The pipeline message comes from Hugging Face transformers and there's nothing AutoGPTQ can do about it. It can be blocked with this code:

from transformers import logging
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

The safetensors message comes from either transformers or safetensors library, not sure which. I don't know how to block that message or if it could be hidden by AutoGPTQ - possibly.

The model weights message comes from Accelerate, and I think could be prevented by AutoGPTQ.

You could raise this as an issue on the AutoGPTQ Github.

Satya93

Jun 25, 2023

The transformers verbosity level command got rid of most of it, I'll look into AutoGPTQ logging, thanks!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment