CUDA error - the provided PTX was compiled with an unsupported toolchain

#23

by melindmi - opened Oct 12, 2023

Oct 12, 2023

Hi, I am trying to use the llama-2-70b-chat.Q5_K_M.gguf model with ctransformers on GPU but I get this error:
CUDA error 222 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:6045: the provided PTX was compiled with an unsupported toolchain.
My torch version is '2.1.0+cu121' and the GPU Driver Version: 525.125.06 supporting CUDA Version: 12.0.

The code: llm = AutoModelForCausalLM.from_pretrained("../llama", model_file="llama-2-13b-chat.q5_K_M.gguf", model_type="llama", gpu_layers=50, temperature=1, context_length=4096)

Can someone suggest something on this?

melindmi

Oct 13, 2023

In case someone else encounters the same issue, this problem is not really related to the model itself but with the ctransformers installation and by having a nvcc version not compatible with the GPU driver version.
When installing the ctransformes with pip install ctransformers[cuda] precompiled libs for CUDA 12.2 are used, but in my cases I needed CUDA version 12.0.
If I used CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers by default the CUDA compiler path was /usr/bin/ which in my case had an older version of nvcc.
The solution was to install the right CUDA version in a different path and then install ctransformers with:
CMAKE_ARGS="-DCMAKE_CUDA_COMPILER=/path_to_cuda/bin/nvcc" CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

melindmi changed discussion status to closed Oct 13, 2023

mmkamani7

Oct 17, 2023

Solved my problem. Thanks

dipshady

Nov 1, 2023

Also worked for me, thanks!

whitew1994

Nov 30, 2023

Also sorted - thanks!

benm5678

Dec 7, 2023

How can you know which CUDA version you need for a model? I didn't see it specified in model card. We're on CUDA11.7, so need to find something that will work for that.

melindmi

Dec 7, 2023

To check the CUDA version you need you can use: nvidia-smi

benm5678

Dec 7, 2023

To check the CUDA version you need you can use: nvidia-smi

nvcc --version showed 11.7.... Nvidia-smi shows below (11.4) -- so given below is my version, how can I know which model I can use? Everything I try generates the Cuda 222 error.

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

melindmi

Dec 7, 2023

You need to use nvcc version 11.4, this is the version your GPU driver supports. So you need to install nvcc version 11.4 then install te ctransformers with:
CMAKE_ARGS="-DCMAKE_CUDA_COMPILER=/path_to_cuda_nvcc_version_11.4/bin/nvcc" CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

benm5678

Dec 8, 2023

You need to use nvcc version 11.4, this is the version your GPU driver supports. So you need to install nvcc version 11.4 then install te ctransformers with:
CMAKE_ARGS="-DCMAKE_CUDA_COMPILER=/path_to_cuda_nvcc_version_11.4/bin/nvcc" CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

I was able to follow your steps and get it downgraded. So now "nvcc --version" shows 11.4 & the pip install was successful (pointed at 11.4 path) -- however, it did not help...still same exact error:

1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr create_gpt_params_cuda: loading model /models/llama-2-13b-chat-hf.Q5_0
1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr ggml_init_cublas: found 1 CUDA devices:
1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr   Device 0: NVIDIA A10G, compute capability 8.6
1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr 
1:25AM DBG GRPC(llama-2-13b-chat-hf.Q5_0-127.0.0.1:39169): stderr CUDA error 222 at /build/sources/go-llama/llama.cpp/ggml-cuda.cu:5548: the provided PTX was compiled with an unsupported toolchain.

melindmi

Dec 8, 2023

Not sure what is the problem in your case. The error I had was different pointing to ctransformers: "CUDA error 222 at /home/runner/work/ctransformers/ctransformers/models/ggml/ggml-cuda.cu:6045: the provided PTX was compiled with an unsupported toolchain."

subhamsubhasis

Mar 25

is there any way to only run on intel CPU.

YaTharThShaRma999

Mar 25

@subhamsubhasis yeah just install llama.cpp with openblas and done, its installed. Then just load the model according to llama.cpp. Set gpu layers to 0

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment