RuntimeError: CUDA Error: no kernel image is available for execution on the device

#2
by resley - opened

看起来cuda是正常的, 是不是要多块GPU才行?
Sun Apr 23 05:32:49 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000001:00:00.0 Off | Off |
| N/A 30C P0 37W / 150W | 321MiB / 8129MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 12805 C python 318MiB |
+-----------------------------------------------------------------------------+
Python 3.8.5

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
/tmp/c150172af3c10165c2613482e6019aa86a543387/README.md.txt
No sentence-transformers model found with name /home/azureuser/.cache/torch/sentence_transformers/GanymedeNil_text2vec-base-chinese. Creating a new one with MEAN pooling.
No sentence-transformers model found with name /home/azureuser/.cache/torch/sentence_transformers/GanymedeNil_text2vec-base-chinese. Creating a new one with MEAN pooling.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : /home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/quantization_kernels_parallel.c -shared -o /home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/quantization_kernels_parallel.so
Load kernel : /home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 3
Parallel kernel is not recommended when parallel num < 4.
Using quantization cache
Applying quantization to glm layers
The dtype of attention mask (torch.int64) is not bool
Traceback (most recent call last):
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/gradio/blocks.py", line 1025, in call_function
prediction = await anyio.to_thread.run_sync(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "LangChain-ChatLLM/app.py", line 158, in predict
resp = get_knowledge_based_answer(
File "LangChain-ChatLLM/app.py", line 132, in get_knowledge_based_answer
result = knowledge_chain({"query": query})
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/base.py", line 116, in call
raise e
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/base.py", line 113, in call
outputs = self._call(inputs)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/retrieval_qa/base.py", line 110, in _call
answer = self.combine_documents_chain.run(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/base.py", line 216, in run
return self(kwargs)[self.output_keys[0]]
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/base.py", line 116, in call
raise e
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/base.py", line 113, in call
outputs = self._call(inputs)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/combine_documents/base.py", line 75, in _call
output, extra_return_dict = self.combine_docs(docs, **other_keys)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/combine_documents/stuff.py", line 83, in combine_docs
return self.llm_chain.predict(**inputs), {}
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/llm.py", line 151, in predict
return self(kwargs)[self.output_key]
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/base.py", line 116, in call
raise e
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/base.py", line 113, in call
outputs = self._call(inputs)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/llm.py", line 57, in _call
return self.apply([inputs])[0]
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/llm.py", line 118, in apply
response = self.generate(input_list)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/chains/llm.py", line 62, in generate
return self.llm.generate_prompt(prompts, stop)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/llms/base.py", line 107, in generate_prompt
return self.generate(prompt_strings, stop=stop)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/llms/base.py", line 140, in generate
raise e
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/llms/base.py", line 137, in generate
output = self._generate(prompts, stop=stop)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/langchain/llms/base.py", line 324, in _generate
text = self._call(prompt, stop=stop)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/resleymlhost/code/Users/liwei/LangChain-ChatLLM/chatllm.py", line 108, in _call
response, _ = self.model.chat(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/modeling_chatglm.py", line 1286, in chat
outputs = self.generate(**inputs, **gen_kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/transformers/generation/utils.py", line 2524, in sample
outputs = self(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/modeling_chatglm.py", line 1191, in forward
transformer_outputs = self.transformer(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/modeling_chatglm.py", line 997, in forward
layer_ret = layer(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/modeling_chatglm.py", line 627, in forward
attention_outputs = self.attention(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/modeling_chatglm.py", line 445, in forward
mixed_raw_layer = self.query_key_value(hidden_states)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/quantization.py", line 375, in forward
output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
File "/home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/quantization.py", line 53, in forward
weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
File "/home/azureuser/.cache/huggingface/modules/transformers_modules/THUDM/chatglm-6b-int4/e02ba894cf18f3fd9b2526c795f983683c4ec732/quantization.py", line 274, in extract_weight_to_half
func(
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 48, in call
func = self._prepare_func()
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 40, in _prepare_func
self._module.get_module(), self._func_name
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/cpm_kernels/kernels/base.py", line 24, in get_module
self._module[curr_device] = cuda.cuModuleLoadData(self._code)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
return f(*args, **kwargs)
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/cpm_kernels/library/cuda.py", line 233, in cuModuleLoadData
checkCUStatus(cuda.cuModuleLoadData(ctypes.byref(module), data))
File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/cpm_kernels/library/cuda.py", line 216, in checkCUStatus
raise RuntimeError("CUDA Error: %s" % cuGetErrorString(error))
RuntimeError: CUDA Error: no kernel image is available for execution on the device

怎么看着觉得像是驱动的问题?

不是必须需要多个GPU,显存够就行

可以帮忙排查一下吗? CUDA目前是11.4的版本,请问大概要从哪个方向入手检查一下?

Google一下

Sign up or log in to comment