Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton

#6
by wangzihan99 - opened
No description provided.
wangzihan99 changed pull request title from Add fused ApplyRoPE and RMSNorm kernels written in OpenAI Triton to Add ApplyRoPE and RMSNorm kernels written in OpenAI Triton

This PR add kernels of ApplyRoPE and RMSNorm written in OpenAI Triton. These kernels offer better performance, support a wider range of GPU architectures (including V100 and T4), and requires no pre-compilation of kernels compared with flash-attn. They are enabled automatically if Triton is installed (usually bundled with PyTorch 2.x).

wangzihan99 changed pull request status to open
wangzihan99 changed pull request status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment