library_name: kernels {% if license %}license: {{ license }} {% endif %}---

TiledAttention is a scaled dot-product attention (SDPA) forward kernel for NVIDIA GPUs, implemented in cuTile Python (TileIR) and exposed for PyTorch-oriented workflows. The design follows FlashAttention-style online softmax with tiled (K,V) streaming, while emphasizing schedule-level modifiability (tile shapes, staging, shared-memory layout) for reproducible kernel research.

In the accompanying study, TiledAttention is evaluated against PyTorch SDPA auto-dispatch and explicit baselines across sequence length, head dimension, causal/non-causal masking, and FP16/BF16 precision.

This Hub kernel is packaged as a Python-only CUDA kernel. At runtime it also requires cupy-cuda13x and cuda-tile in the consumer environment.

arXiv

DOI

How to use

{% if functions %}

# make sure `kernels` is installed: `pip install -U kernels`
from kernels import get_kernel

kernel_module = get_kernel("{{ repo_id }}", version={{ version }})
{{ functions[0] }} = kernel_module.{{ functions[0] }}

{{ functions[0] }}(...)

{% else %}

Usage example not available. {% endif %}

Available functions

{% if functions %} {% for func in functions %}

  • {{ func }} {% endfor %} {% else %}

Function list not available. {% endif %} {% if layers %}

Available layers

{% for layer in layers %}

  • {{ layer }} {% endfor %} {% endif %}

Benchmarks

{% if has_benchmark %}

Benchmarking script is available for this kernel. Run kernels benchmark {{ repo_id }} --version {{ version }}. {% else %}

No benchmark available yet. {% endif %} {% if upstream %}

Source code

Source code of this kernel originally comes from {{ upstream }} and it was repurposed for compatibility with kernels. {% endif %}

Downloads last month
-