Fulltune on Axotol: RuntimeError: CUDA error: device-side assert triggered

#5
by xDAN2099 - opened

File "/root/.cache/huggingface/modules/transformers_modules/NousResearch/Yarn-Llama-2-13b-128k/4e3e87a067f64f8814c83dd5e3bad92dcf8a2391/modeling_llama_together_yarn.py", line 855, in custom_forward
return module(*inputs, output_attentions, None)
File "/root/miniconda3/envs/axo3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/NousResearch/Yarn-Llama-2-13b-128k/4e3e87a067f64f8814c83dd5e3bad92dcf8a2391/modeling_llama_together_yarn.py", line 616, in forward
hidden_states = self.input_layernorm(hidden_states)
File "/root/miniconda3/envs/axo3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/NousResearch/Yarn-Llama-2-13b-128k/4e3e87a067f64f8814c83dd5e3bad92dcf8a2391/modeling_llama_together_yarn.py", line 88, in forward
return rmsnorm_func(hidden_states, self.weight, self.variance_epsilon)
File "/root/.cache/huggingface/modules/transformers_modules/NousResearch/Yarn-Llama-2-13b-128k/4e3e87a067f64f8814c83dd5e3bad92dcf8a2391/modeling_llama_together_yarn.py", line 69, in rmsnorm_func
variance = hidden_states.pow(2).mean(-1, keepdim=True)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

remote_trust: true

cu18 pytorch2.0.1

Axotol Frame

8xA100

Fulltune

Sign up or log in to comment