Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq

Noobing out

#3
by tehnlulz - opened

Hey @TheBloke :

Sorry to bother you here, was just hoping you might point out where I'm noobing out here. Running on Linux with AutoGPTQ, and keep getting this error:

ValueError: QuantLinear() does not have a parameter or a buffer named bias.

Full script here (running on linux)

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

quantized_model_dir = "/notebooks/Manticore-13B-GPTQ/"

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=False)

def get_config(has_desc_act):
    return BaseQuantizeConfig(
        bits=4,  # quantize model to 4-bit
        group_size=128,  # it is recommended to set the value to 128
        desc_act=has_desc_act
    )

def get_model(model_base, triton, model_has_desc_act):
    if model_has_desc_act:
        model_suffix="latest.act-order"
    else:
        model_suffix="compat.no-act-order"
    return AutoGPTQForCausalLM.from_quantized(quantized_model_dir, use_safetensors=True, model_basename=f"Manticore-13B-GPTQ-4bit-128g.no-act-order", device="cuda:0", use_triton=triton, quantize_config=get_config(model_has_desc_act))

# Prevent printing spurious transformers error
logging.set_verbosity(logging.CRITICAL)

prompt='''### Human: Write a story about llamas
### Assistant:'''

model = get_model("/notebooks/Manticore-13B-GPTQ/Manticore-13B-GPTQ-4bit-128g.no-act-order.safetensors", triton=False, model_has_desc_act=False)
#/notebooks/Manticore-13B-GPTQ/.no-act-order.safetensors
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print("### Inference:")
print(pipe(prompt)[0]['generated_text'])

Not sure if you can spot where I'm going wrong here, but keep on getting this error. More detail on the error below:

CUDA extension not installed.
/usr/local/lib/python3.9/dist-packages/accelerate/utils/modeling.py:807: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
The safetensors archive passed at /notebooks/Manticore-13B-GPTQ/Manticore-13B-GPTQ-4bit-128g.no-act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /notebooks/2222.py:28 in <module>                                                                │
│                                                                                                  │
│   25 prompt='''### Human: Write a story about llamas                                             │
│   26 ### Assistant:'''                                                                           │
│   27                                                                                             │
│ ❱ 28 model = get_model("/notebooks/Manticore-13B-GPTQ/Manticore-13B-GPTQ-4bit-128g.no-act-ord    │
│   29 #/notebooks/Manticore-13B-GPTQ/.no-act-order.safetensors                                    │
│   30 pipe = pipeline(                                                                            │
│   31 │   "text-generation",                                                                      │
│                                                                                                  │
│ /notebooks/2222.py:20 in get_model                                                               │
│                                                                                                  │
│   17 │   │   model_suffix="latest.act-order"                                                     │
│   18 │   else:                                                                                   │
│   19 │   │   model_suffix="compat.no-act-order"                                                  │
│ ❱ 20 │   return AutoGPTQForCausalLM.from_quantized(quantized_model_dir, use_safetensors=True,    │
│   21                                                                                             │
│   22 # Prevent printing spurious transformers error                                              │
│   23 logging.set_verbosity(logging.CRITICAL)                                                     │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/auto_gptq/modeling/auto.py:71 in from_quantized           │
│                                                                                                  │
│   68 │   │   model_type = check_and_get_model_type(save_dir)                                     │
│   69 │   │   quant_func = GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized                    │
│   70 │   │   keywords = {key: kwargs[key] for key in signature(quant_func).parameters if key     │
│ ❱ 71 │   │   return quant_func(                                                                  │
│   72 │   │   │   save_dir=save_dir,                                                              │
│   73 │   │   │   device_map=device_map,                                                          │
│   74 │   │   │   max_memory=max_memory,                                                          │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/auto_gptq/modeling/_base.py:589 in from_quantized         │
│                                                                                                  │
│   586 │   │   │   │   no_split_module_classes=[cls.layer_type]                                   │
│   587 │   │   │   )                                                                              │
│   588 │   │   if strict:                                                                         │
│ ❱ 589 │   │   │   model = accelerate.load_checkpoint_and_dispatch(                               │
│   590 │   │   │   │   model,                                                                     │
│   591 │   │   │   │   model_save_name,                                                           │
│   592 │   │   │   │   device_map,                                                                │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/accelerate/big_modeling.py:479 in                         │
│ load_checkpoint_and_dispatch                                                                     │
│                                                                                                  │
│   476 │   │   )                                                                                  │
│   477 │   if offload_state_dict is None and device_map is not None and "disk" in device_map.va   │
│   478 │   │   offload_state_dict = True                                                          │
│ ❱ 479 │   load_checkpoint_in_model(                                                              │
│   480 │   │   model,                                                                             │
│   481 │   │   checkpoint,                                                                        │
│   482 │   │   device_map=device_map,                                                             │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/accelerate/utils/modeling.py:993 in                       │
│ load_checkpoint_in_model                                                                         │
│                                                                                                  │
│    990 │   │   │   │   │   set_module_tensor_to_device(model, param_name, "meta")                │
│    991 │   │   │   │   │   offload_weight(param, param_name, state_dict_folder, index=state_dic  │
│    992 │   │   │   │   else:                                                                     │
│ ❱  993 │   │   │   │   │   set_module_tensor_to_device(model, param_name, param_device, value=p  │
│    994 │   │                                                                                     │
│    995 │   │   # Force Python to clean up.                                                       │
│    996 │   │   del checkpoint                                                                    │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/accelerate/utils/modeling.py:135 in                       │
│ set_module_tensor_to_device                                                                      │
│                                                                                                  │
│    132 │   │   tensor_name = splits[-1]                                                          │
│    133 │                                                                                         │
│    134 │   if tensor_name not in module._parameters and tensor_name not in module._buffers:      │
│ ❱  135 │   │   raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_  │
│    136 │   is_buffer = tensor_name in module._buffers                                            │
│    137 │   old_value = getattr(module, tensor_name)                                              │
│    138                                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: QuantLinear() does not have a parameter or a buffer named bias.

Appreciate any help in advance. Cheers.

Pass strict=False to the .from_quantized() call. I should put that in the README maybe

I'm hoping in future that strict=False will become default, or that this otherwise won't be required. But for now the above works fine.

Cheers mate, sorted. Should setup a donation button somewhere so we can buy you a beer.

Hey @tehnlulz Matt, i'm trying to pass strict=false but i'm getting a strange error. any hints on how you solved it?

TypeError: from_quantized() got an unexpected keyword argument 'strict'

image.png

I found the answer! Thanks all! https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ/discussions/11

tldr:

pip uninstall auto-gptq
pip install git+https://github.com/PanQiWei/AutoGPTQ.git

Yeah, AutoGPTQ should be installed from source for the moment. And the strict parameter is no longer a thing; it got removed and is no longer needed for loading older models.

I tend to install AutoGPTQ this way:

git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip install .

But I guess the pip install git+ method does exactly the same thing without downloading.

@TheBloke I get the following error when I try to install AutoGPTQ from source, any ideas ?

pip install git+https://github.com/PanQiWei/AutoGPTQ.git
Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/PanQiWei/AutoGPTQ.git
Cloning https://github.com/PanQiWei/AutoGPTQ.git to /tmp/pip-req-build-j2sx0fuo
Running command git clone --filter=blob:none --quiet https://github.com/PanQiWei/AutoGPTQ.git /tmp/pip-req-build-j2sx0fuo
Resolved https://github.com/PanQiWei/AutoGPTQ.git to commit d2662b18bb91e1864b29e4e05862712382b8a076
Preparing metadata (setup.py) ... done
Requirement already satisfied: accelerate>=0.22.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117) (0.22.0)
Requirement already satisfied: datasets in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117) (2.16.0)
Requirement already satisfied: gekko in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117) (1.0.6)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from auto-gptq==0.7.0.dev0+cu117) (1.26.2)
Requirement already satisfied: peft>=0.5.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117)
(0.7.1)
Requirement already satisfied: rouge in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117) (1.0.1)
Requirement already satisfied: safetensors in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117)
(0.4.1)
Requirement already satisfied: sentencepiece in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117) (0.1.99)
Requirement already satisfied: torch>=1.13.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117) (1.13.1)
Requirement already satisfied: tqdm in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117) (4.66.1)
Requirement already satisfied: transformers>=4.31.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from auto-gptq==0.7.0.dev0+cu117) (4.32.1)
Requirement already satisfied: psutil in /home/ubuntu/.local/lib/python3.10/site-packages (from accelerate>=0.22.0->auto-gptq==0.7.0.dev0+cu117) (5.9.7)
Requirement already satisfied: packaging>=20.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from accelerate>=0.22.0->auto-gptq==0.7.0.dev0+cu117) (23.2)
Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from accelerate>=0.22.0->auto-gptq==0.7.0.dev0+cu117) (5.4.1)
Requirement already satisfied: huggingface-hub>=0.17.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from peft>=0.5.0->auto-gptq==0.7.0.dev0+cu117) (0.20.1)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /home/ubuntu/.local/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.7.0.dev0+cu117) (11.7.99)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /home/ubuntu/.local/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.7.0.dev0+cu117) (8.5.0.96)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /home/ubuntu/.local/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.7.0.dev0+cu117) (11.7.99)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /home/ubuntu/.local/lib/python3.10/site-packages (from torch>=1.13.0->auto-gptq==0.7.0.dev0+cu117) (11.10.3.66)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->auto-gptq==0.7.0.dev0+cu117) (4.9.0)
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.13.0->auto-gptq==0.7.0.dev0+cu117) (59.6.0)
Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.13.0->auto-gptq==0.7.0.dev0+cu117) (0.37.1)
Requirement already satisfied: requests in /home/ubuntu/.local/lib/python3.10/site-packages (from transformers>=4.31.0->auto-gptq==0.7.0.dev0+cu117) (2.31.0)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /home/ubuntu/.local/lib/python3.10/site-packages (from transformers>=4.31.0->auto-gptq==0.7.0.dev0+cu117) (0.13.3)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers>=4.31.0->auto-gptq==0.7.0.dev0+cu117) (3.13.1)
Requirement already satisfied: regex!=2019.12.17 in /home/ubuntu/.local/lib/python3.10/site-packages (from transformers>=4.31.0->auto-gptq==0.7.0.dev0+cu117) (2023.10.3)
Requirement already satisfied: dill<0.3.8,>=0.3.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from datasets->auto-gptq==0.7.0.dev0+cu117) (0.3.7)
Requirement already satisfied: multiprocess in /home/ubuntu/.local/lib/python3.10/site-packages (from datasets->auto-gptq==0.7.0.dev0+cu117) (0.70.15)
Requirement already satisfied: fsspec[http]<=2023.10.0,>=2023.1.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from datasets->auto-gptq==0.7.0.dev0+cu117) (2023.10.0)
Requirement already satisfied: pyarrow-hotfix in /home/ubuntu/.local/lib/python3.10/site-packages (from datasets->auto-gptq==0.7.0.dev0+cu117) (0.6)
Requirement already satisfied: pandas in /home/ubuntu/.local/lib/python3.10/site-packages (from datasets->auto-gptq==0.7.0.dev0+cu117) (2.1.4)
Requirement already satisfied: pyarrow>=8.0.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from datasets->auto-gptq==0.7.0.dev0+cu117) (14.0.2)
Requirement already satisfied: xxhash in /home/ubuntu/.local/lib/python3.10/site-packages (from datasets->auto-gptq==0.7.0.dev0+cu117) (3.4.1)
Requirement already satisfied: aiohttp in /home/ubuntu/.local/lib/python3.10/site-packages (from datasets->auto-gptq==0.7.0.dev0+cu117) (3.9.1)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from rouge->auto-gptq==0.7.0.dev0+cu117) (1.16.0)
Requirement already satisfied: async-timeout<5.0,>=4.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.7.0.dev0+cu117) (4.0.3)
Requirement already satisfied: yarl<2.0,>=1.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.7.0.dev0+cu117) (1.9.4)
Requirement already satisfied: aiosignal>=1.1.2 in /home/ubuntu/.local/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.7.0.dev0+cu117) (1.3.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/ubuntu/.local/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.7.0.dev0+cu117) (6.0.4)
Requirement already satisfied: frozenlist>=1.1.1 in /home/ubuntu/.local/lib/python3.10/site-packages (from aiohttp->datasets->auto-gptq==0.7.0.dev0+cu117) (1.4.1)
Requirement already satisfied: attrs>=17.3.0 in /usr/lib/python3/dist-packages (from aiohttp->datasets->auto-gptq==0.7.0.dev0+cu117) (21.2.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/ubuntu/.local/lib/python3.10/site-packages (from requests->transformers>=4.31.0->auto-gptq==0.7.0.dev0+cu117) (3.3.2)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->transformers>=4.31.0->auto-gptq==0.7.0.dev0+cu117) (2020.6.20)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->transformers>=4.31.0->auto-gptq==0.7.0.dev0+cu117) (1.26.5)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->transformers>=4.31.0->auto-gptq==0.7.0.dev0+cu117) (3.3)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas->datasets->auto-gptq==0.7.0.dev0+cu117) (2022.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/ubuntu/.local/lib/python3.10/site-packages (from pandas->datasets->auto-gptq==0.7.0.dev0+cu117) (2.8.2)
Requirement already satisfied: tzdata>=2022.1 in /home/ubuntu/.local/lib/python3.10/site-packages (from pandas->datasets->auto-gptq==0.7.0.dev0+cu117) (2023.3)
Building wheels for collected packages: auto-gptq
Building wheel for auto-gptq (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [120 lines of output]
Generating qigen kernels...
conda_cuda_include_dir /usr/lib/python3/dist-packages/nvidia/cuda_runtime/include
running bdist_wheel
/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.10
creating build/lib.linux-x86_64-3.10/tests
copying tests/test_quantization.py -> build/lib.linux-x86_64-3.10/tests
copying tests/test_q4.py -> build/lib.linux-x86_64-3.10/tests
copying tests/test_peft_conversion.py -> build/lib.linux-x86_64-3.10/tests
copying tests/init.py -> build/lib.linux-x86_64-3.10/tests
creating build/lib.linux-x86_64-3.10/auto_gptq
copying auto_gptq/init.py -> build/lib.linux-x86_64-3.10/auto_gptq
creating build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_gptj_attn.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/_fused_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_llama_mlp.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_llama_attn.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
creating build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/perplexity_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/data_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/import_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/exllama_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/patch_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/peft_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
creating build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/sequence_classification_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/language_modeling_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/text_summarization_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
creating build/lib.linux-x86_64-3.10/auto_gptq/quantization
copying auto_gptq/quantization/quantizer.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
copying auto_gptq/quantization/gptq.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
copying auto_gptq/quantization/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
creating build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/internlm.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/mistral.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/gpt_neox.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/_const.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/codegen.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/auto.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/opt.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/bloom.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/stablelmepoch.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/baichuan.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/xverse.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/gpt_bigcode.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/mixtral.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/moss.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/gpt2.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/llama.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/qwen.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/gptj.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/yi.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/decilm.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/rw.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
creating build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_exllama.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_cuda.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_exllamav2.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_triton.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_qigen.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
creating build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/kernels.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/mixin.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/custom_autotune.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
creating build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/classification_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/generation_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
running build_ext
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-req-build-j2sx0fuo/setup.py", line 188, in
setup(
File "/usr/lib/python3/dist-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/wheel/bdist_wheel.py", line 299, in run
self.run_command('build')
File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 386, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (12.3) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for auto-gptq
Running setup.py clean for auto-gptq
Failed to build auto-gptq
Installing collected packages: auto-gptq
Running setup.py install for auto-gptq ... error
error: subprocess-exited-with-error

× Running setup.py install for auto-gptq did not run successfully.
│ exit code: 1
╰─> [124 lines of output]
Generating qigen kernels...
conda_cuda_include_dir /usr/lib/python3/dist-packages/nvidia/cuda_runtime/include
running install
/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.10
creating build/lib.linux-x86_64-3.10/tests
copying tests/test_quantization.py -> build/lib.linux-x86_64-3.10/tests
copying tests/test_q4.py -> build/lib.linux-x86_64-3.10/tests
copying tests/test_peft_conversion.py -> build/lib.linux-x86_64-3.10/tests
copying tests/init.py -> build/lib.linux-x86_64-3.10/tests
creating build/lib.linux-x86_64-3.10/auto_gptq
copying auto_gptq/init.py -> build/lib.linux-x86_64-3.10/auto_gptq
creating build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_gptj_attn.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/_fused_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_llama_mlp.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_llama_attn.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
copying auto_gptq/nn_modules/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules
creating build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/perplexity_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/data_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/import_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/exllama_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/patch_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
copying auto_gptq/utils/peft_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/utils
creating build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/sequence_classification_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/language_modeling_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/text_summarization_task.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks
creating build/lib.linux-x86_64-3.10/auto_gptq/quantization
copying auto_gptq/quantization/quantizer.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
copying auto_gptq/quantization/gptq.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
copying auto_gptq/quantization/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/quantization
creating build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/internlm.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/mistral.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/gpt_neox.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/_const.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/codegen.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/auto.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/opt.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/bloom.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/_base.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/stablelmepoch.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/baichuan.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/xverse.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/gpt_bigcode.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/mixtral.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/moss.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/gpt2.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/llama.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/qwen.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/gptj.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/yi.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/decilm.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
copying auto_gptq/modeling/rw.py -> build/lib.linux-x86_64-3.10/auto_gptq/modeling
creating build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_exllama.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_cuda.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_exllamav2.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_triton.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/qlinear_qigen.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
copying auto_gptq/nn_modules/qlinear/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/qlinear
creating build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/kernels.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/mixin.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/custom_autotune.py -> build/lib.linux-x86_64-3.10/auto_gptq/nn_modules/triton_utils
creating build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/classification_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/generation_utils.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/init.py -> build/lib.linux-x86_64-3.10/auto_gptq/eval_tasks/_utils
running build_ext
/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-req-build-j2sx0fuo/setup.py", line 188, in
setup(
File "/usr/lib/python3/dist-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/usr/lib/python3.10/distutils/core.py", line 148, in setup
dist.run_commands()
File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/setuptools/command/install.py", line 68, in run
return orig.install.run(self)
File "/usr/lib/python3.10/distutils/command/install.py", line 619, in run
self.run_command('build')
File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/usr/lib/python3/dist-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 499, in build_extensions
_check_cuda_version(compiler_name, compiler_version)
File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 386, in _check_cuda_version
raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda))
RuntimeError:
The detected CUDA version (12.3) mismatches the version that was used to compile
PyTorch (11.7). Please make sure to use the same CUDA versions.

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> auto-gptq

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Sign up or log in to comment