Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton
#6
by
wangzihan99
- opened
No description provided.
wangzihan99
changed pull request title from
Add fused ApplyRoPE and RMSNorm kernels written in OpenAI Triton
to Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton
This PR add kernels of ApplyRoPE and RMSNorm written in OpenAI Triton. These kernels offer better performance, support a wider range of GPU architectures (including V100 and T4), and requires no pre-compilation of kernels compared with flash-attn
. They are enabled automatically if Triton is installed (usually bundled with PyTorch 2.x).
wangzihan99
changed pull request status to
open
wangzihan99
changed pull request status to
closed