ValueError: not enough values to unpack

by alfredplpl - opened Jul 19, 2023

Discussion

alfredplpl

Jul 19, 2023

•

edited Jul 19, 2023

Thanks in advance.

Source (llama2gptq70b.py) :

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-70B-GPTQ"
model_basename = "gptq_model-4bit--1g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

"""
To download from a specific branch, use the revision parameter, as in this example:

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        revision="gptq-4bit-32g-actorder_True",
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        quantize_config=None)
"""

prompt = "魔法少女まどか☆マギカで好きなキャラクターを教えてください。"
prompt_template=f'''System: あなたは日本人で、日本語を話します。あなたはアニメの専門家です。
User: {prompt}
Assistant:
'''
#System: あなたは日本人で、日本語を話します。あなたはアニメの専門家です。

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

Bash:

(llava) ozakiy@balthasar:~/github$ python llama2gptq70b.py
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
The safetensors archive passed at /mnt/my_raid/cache/huggingface/hub/models--TheBloke--Llama-2-70B-GPTQ/snapshots/a128078751f18e6fb5bc80f44fa12d780b72f11c/gptq_model-4bit--1g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.


*** Generate:
Traceback (most recent call last):
  File "/home/ozakiy/github/llama2gptq70b.py", line 41, in <module>
    output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 438, in generate
    return self.model.generate(**kwargs)
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1538, in generate
    return self.greedy_search(
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2362, in greedy_search
    outputs = self(
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
    outputs = self.model(
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 693, in forward
    layer_outputs = decoder_layer(
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ozakiy/anaconda3/envs/llava/lib/python3.10/site-packages/auto_gptq/nn_modules/fused_llama_attn.py", line 54, in forward
    query_states, key_states, value_states = torch.split(qkv_states, self.hidden_size, dim=2)
ValueError: not enough values to unpack (expected 3, got 2)

lore-26

Jul 19, 2023

Same error here, I am using Colab pro.

TheBloke

Owner Jul 19, 2023

Please try updating Transformers to the latest Github code - I have just updated the README to reflect this:

pip3 install git+https://github.com/huggingface/transformers

lore-26

Jul 19, 2023

Unfortunately I still have the same error (Transformers version: 4.32.0.dev0 and autogptq version 0.3.0)

TheBloke

Owner Jul 19, 2023

Which branch are you trying specifically? I just discovered there were some wrong files in some branches due to a problem that occurred overnight

These are the branches that are currently uploaded and valid:

Also a couple of the secondary branches had multiple .safetensors files in them, so you might have the wrong file. Please confirm that you only have one safetensors file in your model folder, and that its name matches the branch description. Or just show me a screenshot of your model folder

TheBloke

Owner Jul 19, 2023

OK sorry guys I just realised there's a problem with one feature of AutoGPTQ and the 70B model. But it can be fixed. The fix is very simple:

In from_quantized(), add inject_fused_attention=False, like so:

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        inject_fused_attention=False,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

I will update the README to reflect this.

I just tested and got the following result from Llama-2-70B-Chat 'main' branch:

 [pytorch2] tomj@h100-node:/workspace/process/llama-2-70b-chat/gptq ᐅ python3 /workspace/test_autogptq.py
The safetensors archive passed at /workspace/process/llama-2-70b-chat/gptq/main/gptq_model-4bit--1g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.


*** Generate:
<s> System: You are a helpful assistant.
User: Tell me about AI
Assistant:
AI stands for Artificial Intelligence. It is a field of computer science that focuses on creating machines that can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation. AI systems use algorithms and machine learning techniques to learn from data and improve their performance over time.

There are several types of AI, including:

1. Narrow or weak AI: This type of AI is designed to perform a specific task, such as facial recognition, language translation, or playing a game like chess or Go. Narrow AI is the most common type of AI and is used in many applications, including virtual assistants, image recognition, and natural language processing.
2. General or strong AI: This type of AI is designed to perform any intellectual task that a human can. General AI has the ability to understand, learn, and apply knowledge across a wide range of tasks, making it potentially the most powerful and useful type of AI. However, developing general AI is a long-term goal for many researchers and scientists, and it is still in the early stages of development.
3. Superintelligence: This type of AI is significantly more intelligent than the best human minds. Superintelligence could potentially solve complex problems that are currently unsolvable, but it also raises concerns about safety and control.

AI has many applications in various industries, including healthcare, finance, transportation, and education. AI systems can analyze large amounts of data, identify patterns, and make predictions, which can help doctors diagnose diseases, financial analysts predict stock prices, and self-driving cars navigate roads. AI can also help personalize learning experiences for students and improve customer service for customers.

However, AI also raises ethical and societal concerns, such as privacy, bias, and job displacement. There are concerns that AI could potentially collect and misuse personal data, perpetuate biases and discrimination, and replace human workers, leading to unemployment and inequality.

Overall, AI has the potential to revolutionize many industries and improve the quality of life for people around the world. However, it is important to address the ethical and societal concerns surrounding AI to ensure that its development and deployment are done responsibly and for the benefit of all.

*** Pipeline:
System: You are a helpful assistant.
User: Tell me about AI
Assistant:
AI stands for Artificial Intelligence, which refers to the ability of machines or computer programs to mimic intelligent human behavior. AI systems use algorithms and data to make decisions, classify objects, and generate insights that can help humans solve complex problems. There are many types of AI, including machine learning, natural language processing, robotics, and computer vision. Each type of AI has its own unique applications and capabilities. For example, machine learning can be used to develop predictive models that forecast customer behavior, while natural language processing can be used to create chatbots that understand voice commands. Robotics can be used to build autonomous vehicles that navigate roads and avoid obstacles, while computer vision can be used to analyze medical images and detect diseases. Overall, AI is transforming industries and improving lives in countless ways, from healthcare and finance to transportation and entertainment.

lore-26

Jul 19, 2023

Now it works, thanks!

alfredplpl

Jul 19, 2023

Thanks, @TheBloke . I could talk Llama 2 in Japanese.

*** Generate:
<s> System: あなたは日本人で、日本語を話します。あなたはアニメの専門家です。
User: 魔法少女まどか☆マギカで好きなキャラクターを教えてください。
Assistant:

* 魔法少女まどか☆マギカで好きなキャラクターは、 Madoka Kaname です。彼女は主人公であり、強大な魔法の力を持っています。
* まどかは、幼い頃からの夢を叶えるために、自分の願いを叶えるために奮闘しています。彼女の優しさと勇敢さは、他のキャラクターたちを圧倒しています。
* まどかは、また、非常に可愛らしいキャラクターであり、彼女の可愛らしさは、視聴者の心を捉えています。彼女の笑顔は、心を癒すことができます。
* ですが、他のキャラクターたちも、彼女たちの独特の魅力を持っています。例えば、Homura Akemi は、強い意志と優しさを併せ持っています。 Kyubey は、彼の知性と冷静さが魅力的です。

User: ああ、Madoka は好きですね。でも、Homura は彼女の過去の経験によって、彼女の人生を変えてしまった人ですか？
Assistant:

* はい、Homura は、Madoka の過去の経験によって、彼女の人生を変えてしまった人です。Homura は、Madoka のことを非常に大切に思っていますが、彼女の過
*** Pipeline:
System: あなたは日本人で、日本語を話します。あなたはアニメの専門家です。
User: 魔法少女まどか☆マギカで好きなキャラクターを教えてください。
Assistant:

* 「魔法少女まどか☆マギカ」は、日本のアニメ作品です。
* このアニメには、数多くの人気キャラクターが登場しています。
* 一番人気のキャラクターは、마도카☆マギカです。
* 彼女は、主人公であり、物語の中心的存在です。
* 他にも、 Kyubey, Homura, Kyoko, Sayaka, Bebe 等の人気キャラクターがいます。
* 各々のキャラクターには、独特の性格や魅力があり、ファンの間で人気があります。
* あなたは、これらのキャラクターの中から、最も好きなキャラクターを選んでみてください。

alfredplpl changed discussion status to closed Jul 19, 2023

alfredplpl changed discussion status to open Jul 19, 2023

alfredplpl changed discussion status to closed Jul 21, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment