error: unexpected keyword argument 'inject_fused_attention'

#19
by lasalH - opened

HI everyone, really appreciate @TheBloke for these wonderful models

Im trying to set up the TheBloke/Llama-2-70B-chat-GPTQ for basic inferencing as python code. The steps I followed were as follows:
Environment:
- A6000 RTX
- 62 GB ram
Proccess:
- install auto-gptq (GITHUB_ACTIONS=true pip3 install auto-gptq)
- install the latest transformers lib (pip3 install git+https://github.com/huggingface/transformers)
Code:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/Llama-2-70B-chat-GPTQ"
model_basename = "gptq_model-4bit--1g"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
inject_fused_attention=False, # Required for Llama 2 70B model at this time.
use_safetensors=True,
trust_remote_code=False,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

Error:
TypeError Traceback (most recent call last)
Cell In[1], line 11
7 use_triton = False
9 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
---> 11 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
12 model_basename=model_basename,
13 inject_fused_attention=False, # Required for Llama 2 70B model at this time.
14 use_safetensors=True,
15 trust_remote_code=False,
16 device="cuda:0",
17 use_triton=use_triton,
18 quantize_config=None)

TypeError: AutoGPTQForCausalLM.from_quantized() got an unexpected keyword argument 'inject_fused_attention'

Any help will be appreciated. Thank you in advance

Its weird because in the AutoGPTQ the injection_fused_attention is declared clearly.

def from_quantized(
    cls,
    model_name_or_path: Optional[str] = None,
    save_dir: Optional[str] = None,
    device_map: Optional[Union[str, Dict[str, Union[str, int]]]] = None,
    max_memory: Optional[dict] = None,
    device: Optional[Union[str, int]] = None,
    low_cpu_mem_usage: bool = False,
    use_triton: bool = False,
    inject_fused_attention: bool = True,
    inject_fused_mlp: bool = True,
    use_cuda_fp16: bool = True,
    quantize_config: Optional[BaseQuantizeConfig] = None,
    model_basename: Optional[str] = None,
    use_safetensors: bool = False,
    trust_remote_code: bool = False,
    warmup_triton: bool = False,
    trainable: bool = False,
    **kwargs
) -> BaseGPTQForCausalLM:
    model_type = check_and_get_model_type(
        save_dir or model_name_or_path, trust_remote_code
    )

I do not have this error I have another one (https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ/discussions/18#64be578976a6e2efccc31cd0) different but it seems later than yours. Which python version are you using? (I use 3.8). Which version of auto-gptq? (I have 0.3.0).

@lasalH this error suggest AutoGPTQ is on an earlier version. I am not sure why that's happened, but can you try:

pip3 uninstall -y auto-gptq
GITHUB_ACTIONS=true pip3 install auto-gptq==0.2.2

report if there's any errors shown by that command, and if not, test again.

I've specified 0.2.2 as there's currently a bug in 0.3.0 which affects inference with some of my GPTQ uploads (the ones that have act_order + group_size together). The bug has been fixed and there should be another release soon, 0.3.1, but for now use 0.2.2

Thank you @TheBloke . Installing auto-gptq version 0.2.2 fixed the issue.

Sign up or log in to comment