指定GPU后会报错CUBLAS_STATUS_EXECUTION_FAILED

#1
by dafen - opened

import torch
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
model_dir = snapshot_download("vivo-ai/BlueLM-7B-Chat-32K", revision="v1.0.1")
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cuda:3", torch_dtype=torch.bfloat16, trust_remote_code=True)
model = model.eval()
inputs = tokenizer("[|Human|]:三国演义的作者是谁?[|AI|]:", return_tensors="pt")
inputs = inputs.to("cuda:3")
pred = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

报错:
File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

During handling of the above exception, another exception occurred:

GPU型号 A100,指定第4张卡加载模型运行会报错!

补充一下,运行时发现模型主要参数加载到gpu3上了,但是不知道为什么还有小部分在gpu0上也存在。没有完全加载到gpu3

vivo AI Lab org

可以用CUDA_VISIBLE_DEVICES 指定下显卡试试

这种设置确实可以了

jeffreygao changed discussion status to closed

Sign up or log in to comment