Any working tag/release/hash of AutoGPTQ?

#4
by mike-ravkine - opened

Hi TheBlocke!

I'm trying the python code from the model card with latest AutoGPTQ on an A100-40G, the model loads but I get a failure during inference:

Loading tokenizer...
Loading model...
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The safetensors archive passed at /model/gptq_model-4bit--1g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
can't get model's sequence length from model config, will set to 4096.
RWGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention.
RWGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp.
Model loaded in 196.37s
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Traceback (most recent call last):
  File "/pkg/modal/_container_entrypoint.py", line 330, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 403, in call_function_sync
    res = fun(*args, **kwargs)
  File "/root/gptqfalcon.py", line 72, in generate
    output = self.model.generate(input_ids=tokens, max_new_tokens=100, do_sample=True, temperature=0.8)
  File "/repositories/AutoGPTQ/auto_gptq/modeling/_base.py", line 426, in generate
    return self.model.generate(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1565, in generate
    return self.sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2612, in sample
    outputs = self(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/model/modelling_RW.py", line 759, in forward
    transformer_outputs = self.transformer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/model/modelling_RW.py", line 654, in forward
    outputs = block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/model/modelling_RW.py", line 396, in forward
    attn_outputs = self.self_attention(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/model/modelling_RW.py", line 252, in forward
    fused_qkv = self.query_key_value(hidden_states)  # [batch_size, seq_length, 3 x hidden_size]
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/repositories/AutoGPTQ/auto_gptq/nn_modules/qlinear_old.py", line 189, in forward
    autogptq_cuda.vecquant4matmul_faster_old(x, self.qweight, out, self.scales.float(), self.qzeros, self.group_size, self.half_indim)
AttributeError: module 'autogptq_cuda' has no attribute 'vecquant4matmul_faster_old'

Any specific version of AutoGPTQ that's known to be compatible with this model? I really want to give it a try!

Can you confirm you built the latest version from source with these commands?

git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip install .

Yes, also installing the extra dependency as noted:

        "git clone https://github.com/PanQiWei/AutoGPTQ /repositories/AutoGPTQ",
        "cd /repositories/AutoGPTQ && pip install . && pip install einops",

I think I have a hunch as to what's wrong. For the scripts that I have working AutoGPTQ, there's a "python setup.py install" step.. I'm going to try adding that, as I think its what actually compiles the CUDA.

pip install should do that, but yeah you can run it by hand as well if you want.

Maybe first try:

pip uninstall auto-gptq
pip install .

FYI PanQiWei just PR'd code that will soon provide pre-compiled binary wheels for AutoGPTQ, so soon it won't be necessary to compile from source.

Success! Adding && python setup.py install built the cuda module and fixed the crash and I get a responce!

### Instruction: write a story about llamas
### Response:
A group of llamas were out exploring the countryside when they stumbled upon an old, forgotten temple. As they walked through the entrance, they were taken aback by the intricate carvings and the sheer size of the temple. They stayed for a while, marveling at the magnificent architecture and absorbing the peaceful energy it exuded. After awhile, they decided to continue their journey, inspired by the temple's beauty and wisdom.<|endoftext|>-1:<|endoftext|>In the distance, the temple glowed with

I'll look into whats up with <|endoftext|> and open another issue if problem is not in my code.

Thanks!

mike-ravkine changed discussion status to closed

Sign up or log in to comment