CUDA error - the provided PTX was compiled with an unsupported toolchain

#23
by melindmi - opened

Hi, I am trying to use the llama-2-70b-chat.Q5_K_M.gguf model with ctransformers on GPU but I get this error:
CUDA error 222 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:6045: the provided PTX was compiled with an unsupported toolchain.
My torch version is '2.1.0+cu121' and the GPU Driver Version: 525.125.06 supporting CUDA Version: 12.0.

The code: llm = AutoModelForCausalLM.from_pretrained("../llama", model_file="llama-2-13b-chat.q5_K_M.gguf", model_type="llama", gpu_layers=50, temperature=1, context_length=4096)

Can someone suggest something on this?

In case someone else encounters the same issue, this problem is not really related to the model itself but with the ctransformers installation and by having a nvcc version not compatible with the GPU driver version.
When installing the ctransformes with pip install ctransformers[cuda] precompiled libs for CUDA 12.2 are used, but in my cases I needed CUDA version 12.0.
If I used CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers by default the CUDA compiler path was /usr/bin/ which in my case had an older version of nvcc.
The solution was to install the right CUDA version in a different path and then install ctransformers with:
CMAKE_ARGS="-DCMAKE_CUDA_COMPILER=/path_to_cuda/bin/nvcc" CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

melindmi changed discussion status to closed

Solved my problem. Thanks

Also worked for me, thanks!

Also sorted - thanks!

How can you know which CUDA version you need for a model? I didn't see it specified in model card. We're on CUDA11.7, so need to find something that will work for that.

To check the CUDA version you need you can use: nvidia-smi

To check the CUDA version you need you can use: nvidia-smi

nvcc --version showed 11.7.... Nvidia-smi shows below (11.4) -- so given below is my version, how can I know which model I can use? Everything I try generates the Cuda 222 error.

nvidia-smi
Thu Dec 7 17:25:41 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 19C P8 15W / 300W | 0MiB / 22731MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

You need to use nvcc version 11.4, this is the version your GPU driver supports. So you need to install nvcc version 11.4 then install te ctransformers with:
CMAKE_ARGS="-DCMAKE_CUDA_COMPILER=/path_to_cuda_nvcc_version_11.4/bin/nvcc" CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

You need to use nvcc version 11.4, this is the version your GPU driver supports. So you need to install nvcc version 11.4 then install te ctransformers with:
CMAKE_ARGS="-DCMAKE_CUDA_COMPILER=/path_to_cuda_nvcc_version_11.4/bin/nvcc" CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

I was able to follow your steps and get it downgraded. So now "nvcc --version" shows 11.4 & the pip install was successful (pointed at 11.4 path) -- however, it did not help...still same exact error:

1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr create_gpt_params_cuda: loading model /models/llama-2-13b-chat-hf.Q5_0
1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr ggml_init_cublas: found 1 CUDA devices:
1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr   Device 0: NVIDIA A10G, compute capability 8.6
1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr 
1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr CUDA error 222 at /build/sources/go-llama/llama.cpp/ggml-cuda.cu:5548: the provided PTX was compiled with an unsupported toolchain.

Not sure what is the problem in your case. The error I had was different pointing to ctransformers: "CUDA error 222 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:6045: the provided PTX was compiled with an unsupported toolchain."

is there any way to only run on intel CPU.

@subhamsubhasis yeah just install llama.cpp with openblas and done, its installed. Then just load the model according to llama.cpp. Set gpu layers to 0

Sign up or log in to comment