vivo-ai/BlueLM-7B-Base-32K · 指定GPU后会报错CUBLAS_STATUS_EXECUTION

Nov 10, 2023

import torch
from modelscope import AutoModelForCausalLM, AutoTokenizer, snapshot_download
model_dir = snapshot_download("vivo-ai/BlueLM-7B-Chat-32K", revision="v1.0.1")
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="cuda:3", torch_dtype=torch.bfloat16, trust_remote_code=True)
model = model.eval()
inputs = tokenizer("[|Human|]:三国演义的作者是谁？[|AI|]:", return_tensors="pt")
inputs = inputs.to("cuda:3")
pred = model.generate(**inputs, max_new_tokens=64, repetition_penalty=1.1)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

报错：
File /opt/conda/lib/python3.8/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda, b, CUDA_R_16BF, ldb, &fbeta, c, CUDA_R_16BF, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

During handling of the above exception, another exception occurred:

GPU型号 A100，指定第4张卡加载模型运行会报错！

dafen

Nov 10, 2023

补充一下，运行时发现模型主要参数加载到gpu3上了，但是不知道为什么还有小部分在gpu0上也存在。没有完全加载到gpu3

JoeyHeisenberg

vivo AI Lab org Nov 13, 2023

可以用CUDA_VISIBLE_DEVICES 指定下显卡试试

dafen

Nov 13, 2023

这种设置确实可以了

jeffreygao changed discussion status to closed Nov 23, 2023

vivo-ai
/

BlueLM-7B-Base-32K

指定GPU后会报错CUBLAS_STATUS_EXECUTION_FAILED