Issues with CUDA and exllama_kernels

#47
by ditchtech - opened

Hello Bloke,
While running a sample application, I receive the following error -
CUDA extension not installed.
exllama_kernels not installed.

Pytorch Cuda versions - pytorch:2.0.1-py3.10-cuda11.8.0
Installed autogptq from source https://huggingface.github.io/autogptq-index/whl/cu118/

Yet I get following error trace -
from auto_gptq import AutoGPTQForCausalLM
File "/usr/local/lib/python3.10/dist-packages/auto_gptq/init.py", line 4, in
from .utils.peft_utils import get_gptq_peft_model
File "/usr/local/lib/python3.10/dist-packages/auto_gptq/utils/peft_utils.py", line 20, in
from ..nn_modules.qlinear.qlinear_exllama import QuantLinear as QuantLinearExllama
File "/usr/local/lib/python3.10/dist-packages/auto_gptq/nn_modules/qlinear/qlinear_exllama.py", line 14, in
from exllama_kernels import make_q4, q4_matmul
ImportError: /usr/local/lib/python3.10/dist-packages/exllama_kernels.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
exllama_kernels not installed

This was not happening before. I noticed the autogptq package updates on 2nd Nov. Does that have a bearing?

Having the same issue

I installed the cuda toolkits first using this which was required in my case. It removed some errors: !sudo apt install -q nvidia-cuda-toolkit

Later I left with One error : Which basically says exllama_kernals are not installed



ERROR:auto_gptq.nn_modules.qlinear.qlinear_exllama:exllama_kernels not installed.

ImportError Traceback (most recent call last)
in <cell line: 6>()
4 # To use a different branch, change revision
5 # For example: revision="gptq-4bit-32g-actorder_True"
----> 6 model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
7 device_map="auto",
8 trust_remote_code=False,

6 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
564 elif type(config) in cls._model_mapping.keys():
565 model_class = _get_model_class(config, cls._model_mapping)
--> 566 return model_class.from_pretrained(
567 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
568 )

/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
2810 else:
2811 # Need to protect the import
-> 2812 from optimum.gptq import GPTQQuantizer
2813 if quantization_method_from_config == QuantizationMethod.GPTQ:
2814 quantization_config = GPTQConfig.from_dict(config.quantization_config)

/usr/local/lib/python3.10/dist-packages/optimum/gptq/init.py in
13 # See the License for the specific language governing permissions and
14 # limitations under the License.
---> 15 from .quantizer import GPTQQuantizer, load_quantized_model

/usr/local/lib/python3.10/dist-packages/optimum/gptq/quantizer.py in
43
44 if is_auto_gptq_available():
---> 45 from auto_gptq import exllama_set_max_input_length
46 from auto_gptq.modeling._utils import autogptq_post_init
47 from auto_gptq.quantization import GPTQ

/usr/local/lib/python3.10/dist-packages/auto_gptq/init.py in
2 from .modeling import BaseQuantizeConfig
3 from .modeling import AutoGPTQForCausalLM
----> 4 from .utils.peft_utils import get_gptq_peft_model
5 from .utils.exllama_utils import exllama_set_max_input_length

/usr/local/lib/python3.10/dist-packages/auto_gptq/utils/peft_utils.py in
18 from ..nn_modules.qlinear.qlinear_cuda import QuantLinear as QuantLinearCuda
19 from ..nn_modules.qlinear.qlinear_cuda_old import QuantLinear as QuantLinearCudaOld
---> 20 from ..nn_modules.qlinear.qlinear_exllama import QuantLinear as QuantLinearExllama
21 from ..nn_modules.qlinear.qlinear_qigen import QuantLinear as QuantLinearQigen
22 from ..nn_modules.qlinear.qlinear_triton import QuantLinear as QuantLinearTriton

/usr/local/lib/python3.10/dist-packages/auto_gptq/nn_modules/qlinear/qlinear_exllama.py in
12
13 try:
---> 14 from exllama_kernels import make_q4, q4_matmul
15 except ImportError:
16 logger.error('exllama_kernels not installed.')

ImportError: libcudart.so.12: cannot open shared object file: No such file or directory


I was using this code :



from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
device_map="auto",
trust_remote_code=False,
revision="gptq-8bit-32g-actorder_True")
#
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)


I am Running this on Colab and using Pytorch version 2.1

Even this pipeline is not working and throwing the same error

Load model directly

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TheBloke/Llama-2-13B-chat-GPTQ")
model = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-13B-chat-GPTQ")

I'm having the same issue. It was running few days back but now it throws the following error:

ImportError: /opt/conda/lib/python3.10/site-packages/exllama_kernels.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

ImportError Traceback (most recent call last)
Cell In[3], line 7
4 model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
5 # To use a different branch, change revision
6 # For example: revision="main"
----> 7 model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
8 device_map="auto",
9 trust_remote_code=False,
10 revision="main")
12 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
13 # tokenizer.pad_token = tokenizer.eos_token

File /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:563, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
561 elif type(config) in cls._model_mapping.keys():
562 model_class = _get_model_class(config, cls._model_mapping)
--> 563 return model_class.from_pretrained(
564 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
565 )
566 raise ValueError(
567 f"Unrecognized configuration class {config.class} for this kind of AutoModel: {cls.name}.\n"
568 f"Model type should be one of {', '.join(c.name for c in cls._model_mapping.keys())}."
569 )

File /opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py:2577, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
2572 raise ImportError(
2573 "Loading GPTQ quantized model requires optimum library : pip install optimum and auto-gptq library 'pip install auto-gptq'"
2574 )
2575 else:
2576 # Need to protect the import
-> 2577 from optimum.gptq import GPTQQuantizer
2578 if quantization_method_from_config == QuantizationMethod.GPTQ:
2579 quantization_config = GPTQConfig.from_dict(config.quantization_config)

File /opt/conda/lib/python3.10/site-packages/optimum/gptq/init.py:15
1 # coding=utf-8
2 # Copyright 2023 HuggingFace Inc. team.
3 #
(...)
13 # See the License for the specific language governing permissions and
14 # limitations under the License.
---> 15 from .quantizer import GPTQQuantizer, load_quantized_model

File /opt/conda/lib/python3.10/site-packages/optimum/gptq/quantizer.py:45
42 from accelerate.hooks import remove_hook_from_module
44 if is_auto_gptq_available():
---> 45 from auto_gptq import exllama_set_max_input_length
46 from auto_gptq.modeling._utils import autogptq_post_init
47 from auto_gptq.quantization import GPTQ

File /opt/conda/lib/python3.10/site-packages/auto_gptq/init.py:4
2 from .modeling import BaseQuantizeConfig
3 from .modeling import AutoGPTQForCausalLM
----> 4 from .utils.peft_utils import get_gptq_peft_model
5 from .utils.exllama_utils import exllama_set_max_input_length

File /opt/conda/lib/python3.10/site-packages/auto_gptq/utils/peft_utils.py:20
18 from ..nn_modules.qlinear.qlinear_cuda import QuantLinear as QuantLinearCuda
19 from ..nn_modules.qlinear.qlinear_cuda_old import QuantLinear as QuantLinearCudaOld
---> 20 from ..nn_modules.qlinear.qlinear_exllama import QuantLinear as QuantLinearExllama
21 from ..nn_modules.qlinear.qlinear_qigen import QuantLinear as QuantLinearQigen
22 from ..nn_modules.qlinear.qlinear_triton import QuantLinear as QuantLinearTriton

File /opt/conda/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_exllama.py:14
11 logger = getLogger(name)
13 try:
---> 14 from exllama_kernels import make_q4, q4_matmul
15 except ImportError:
16 logger.error('exllama_kernels not installed.')

ImportError: /opt/conda/lib/python3.10/site-packages/exllama_kernels.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

@TheBloke can you please help us with this!

I have found a solution for this I think. While fine-tuning on GPU especially when we fine-tune a gpt-q qunatized models, then we need to disable the Exllama.

I came across this video on YouTube. This might help you :- https://youtu.be/T7haqIbHKm0?si=mPxt8NvUqggvWMli

Watch from 14:13

----- The Bug is Fixed NOW: ----

Now the error issue does not persists. Just use: --------> pip install auto-gptq

Keep in mind to use this version of torch. ----------> 2.1.0+cu118

WARNING:auto_gptq.nn_modules.qlinear.qlinear_cuda:CUDA extension not installed.
WARNING:auto_gptq.nn_modules.qlinear.qlinear_cuda_old:CUDA extension not installed.````
then use this: ---------->                 !sudo apt install -q nvidia-cuda-toolkit

Reference:
https://pypi.org/project/auto-gptq/   ----------> auto-gptq has been updated on 9th Nov
https://github.com/PanQiWei/AutoGPTQ/issues/398        ---------> This thread talks about using an older version of auto-gpt i.e 0.4.2 BUT NOW THAT'S NOT NEEDED
deleted

Myself, i still have a CUDA version issue to deal with, after some other upgrades to get past the other recent issue floating around. Others might as well. My drivers are 'too old' according to some of the libraries. ( i think it was pytorch, haven't spent the time to go back and review since my CPU only box now works again, which is the important one )

Sign up or log in to comment