Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton
#10
by
Cheshire94
- opened
No description provided.
Cheshire94
changed pull request title from
pr/9
to pr/10
This PR add kernels of ApplyRoPE and RMSNorm written in OpenAI Triton. These kernels offer better performance, support a wider range of GPU architectures (including V100 and T4), and require no pre-compilation, compared with flash-attn
. They are enabled automatically if Triton is installed (usually bundled with PyTorch 2.x).
Cheshire94
changed pull request title from
pr/10
to Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton
Cheshire94
changed pull request status to
closed