Native CUDA transformer helper kernels from FlashRT for fused activation, layout, RoPE, argmax/spec-accept, and router top-k work.
See README.md for the public function list.
README.md