CUDA error while running bloom-accelerate-inference.py | RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul

#160
by rsoods - opened

Hello All,

Has anyone faced this issue, while running the bloom accelerate inference script? Is there a prescribed CUDA driver/python/torch setup for running the script?

$ python bloom-inference-scripts/bloom-accelerate-inference.py --name bigscience/bloom --batch_size 1 --benchmark

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())

I'm trying to run this on a 8x A100-GPU dgx with the following environment setup,

$ pip list
Package Version


accelerate 0.15.0
certifi 2022.12.7
charset-normalizer 2.1.1
filelock 3.8.2
huggingface-hub 0.11.1
idna 3.4
numpy 1.24.0
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
packaging 22.0
pip 21.2.4
psutil 5.9.4
PyYAML 6.0
regex 2022.10.31
requests 2.28.1
setuptools 58.1.0
tokenizers 0.13.2
torch 1.13.1
tqdm 4.64.1
transformers 4.25.1
triton 1.0.0
typing_extensions 4.4.0
urllib3 1.26.13
wheel 0.38.4


nvidia-smi


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... On | 00000000:07:00.0 Off | 0 |
| N/A 32C P0 62W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... On | 00000000:0F:00.0 Off | 0 |
| N/A 30C P0 62W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM... On | 00000000:47:00.0 Off | 0 |
| N/A 31C P0 62W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM... On | 00000000:4E:00.0 Off | 0 |
| N/A 31C P0 61W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM... On | 00000000:87:00.0 Off | 0 |
| N/A 36C P0 63W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM... On | 00000000:90:00.0 Off | 0 |
| N/A 35C P0 63W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM... On | 00000000:B7:00.0 Off | 0 |
| N/A 35C P0 64W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM... On | 00000000:BD:00.0 Off | 12814 |
| N/A 35C P0 64W / 400W | 3MiB / 81251MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

Sign up or log in to comment