CUDA extension not installed

#3
by kllisre - opened

Thanks for your work.
When I am running

tokenizer = AutoTokenizer.from_pretrained(local_dir, use_fast=False)
model = AutoGPTQForCausalLM.from_quantized(local_dir, device="cuda:0", use_triton=False, use_safetensors=True, torch_dtype=torch.float32, trust_remote_code=True)

I get these warnings:

CUDA extension not installed.
RWGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention.
RWGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp.

I'm wondering if CUDA extension not installed affects model performance. I can't figure out if it uses my GPU. It seems that I see a load on 6gb vram, but I don’t see PID of the task that would work during inference. Maybe it works on the CPU? At times, inference can take a very long time.
My env:

Collecting environment information...
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.31

Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.2.152
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1080 Ti
Nvidia driver version: 515.86.01
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Model name:    Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.0.1
[conda] numpy        1.24.3                   pypi_0    pypi
[conda] torch        2.0.1                    pypi_0    pypi

Did you build Autogptq with CUDA locally?

Did you build Autogptq with CUDA locally?

Yes, I did that on my local machine:

git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip install .
pip install einops

Did you build Autogptq with CUDA locally?

Yes, I did that on my local machine:

git clone https://github.com/PanQiWei/AutoGPTQ
cd AutoGPTQ
pip install .
pip install einops

I have this errors :
(localGPT) [dg@localhost AutoGPTQ]$ pip install .
Processing /home/dg-linc/AutoGPTQ
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/home/dg-linc/AutoGPTQ/setup.py", line 58, in
CUDA_VERSION = "".join(os.environ.get("CUDA_VERSION", default_cuda_version).split("."))
AttributeError: 'NoneType' object has no attribute 'split'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

someone have an idea ?

Sign up or log in to comment