CUDA runtime error: an illegal memory access was encountered.

#48
by qfalex12 - opened

Under input 2000 and output 64, a cuda error occurs when batch_size > 16.

Environment:
nvcr.io/nvidia/pytorch:23.02-py3
A100-SXM4-80GB

glm_error_bs17.png

Thanks.

Tencent Music Entertainment Lyra Lab org

Seems like an OOM. It's reasonable since both the seq length and batch size are large.

Sign up or log in to comment