Layers API Reference

Making layers kernel-aware

use_kernel_forward_from_hub

kernels.use_kernel_forward_from_hub

< source >

( layer_name: str ) → Callable

Parameters

layer_name (str) — The name of the layer to use for kernel lookup in registered mappings.

Returns

Callable

A decorator function that can be applied to layer classes.

Decorator factory that makes a layer extensible using the specified layer name.

This is a decorator factory that returns a decorator which prepares a layer class to use kernels from the Hugging Face Hub.

Example:

import torch
import torch.nn as nn

from kernels import use_kernel_forward_from_hub
from kernels import Mode, kernelize

@use_kernel_forward_from_hub("MyCustomLayer")
class MyCustomLayer(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.hidden_size = hidden_size

    def forward(self, x: torch.Tensor):
        # original implementation
        return x

model = MyCustomLayer(768)

# The layer can now be kernelized:
# model = kernelize(model, mode=Mode.TRAINING | Mode.TORCH_COMPILE, device="cuda")

replace_kernel_forward_from_hub

kernels.replace_kernel_forward_from_hub

< source >

( layer_name: str )

Function that prepares a layer class to use kernels from the Hugging Face Hub.

It is recommended to use use_kernel_forward_from_hub() decorator instead. This function should only be used as a last resort to extend third-party layers, it is inherently fragile since the member variables and forward signature of usch a layer can change.

Example:

from kernels import replace_kernel_forward_from_hub
import torch.nn as nn

replace_kernel_forward_from_hub(nn.LayerNorm, "LayerNorm")

Registering kernel mappings

use_kernel_mapping

kernels.use_kernel_mapping

< source >

( mapping: Dict[str, Dict[Union[Device, str], Union[LayerRepositoryProtocol, Dict[Mode, LayerRepositoryProtocol]]]] inherit_mapping: bool = True )

Parameters

mapping (Dict[str, Dict[Union[Device, str], Union[LayerRepositoryProtocol, Dict[Mode, LayerRepositoryProtocol]]]]) — The kernel mapping to apply. Maps layer names to device-specific kernel configurations.
inherit_mapping (bool, optional, defaults to True) — When True, the current mapping will be extended by mapping inside the context. When False, only mapping is used inside the context.

Context manager that sets a kernel mapping for the duration of the context.

This function allows temporary kernel mappings to be applied within a specific context, enabling different kernel configurations for different parts of your code.

Example:

import torch
import torch.nn as nn
from torch.nn import functional as F

from kernels import use_kernel_forward_from_hub
from kernels import use_kernel_mapping, LayerRepository, Device
from kernels import Mode, kernelize

# Define a mapping
mapping = {
    "SiluAndMul": {
        "cuda": LayerRepository(
            repo_id="kernels-community/activation",
            layer_name="SiluAndMul",
        )
    }
}

@use_kernel_forward_from_hub("SiluAndMul")
class SiluAndMul(nn.Module):
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        d = x.shape[-1] // 2
        return F.silu(x[..., :d]) * x[..., d:]

model = SiluAndMul()

# Use the mapping for the duration of the context.
with use_kernel_mapping(mapping):
    # kernelize uses the temporary mapping
    model = kernelize(model, mode=Mode.TRAINING | Mode.TORCH_COMPILE, device="cuda")

# Outside the context, original mappings are restored

register_kernel_mapping

kernels.register_kernel_mapping

< source >

( mapping: Dict[str, Dict[Union[Device, str], Union[LayerRepositoryProtocol, Dict[Mode, LayerRepositoryProtocol]]]] inherit_mapping: bool = True )

Parameters

mapping (Dict[str, Dict[Union[Device, str], Union[LayerRepositoryProtocol, Dict[Mode, LayerRepositoryProtocol]]]]) — The kernel mapping to register globally. Maps layer names to device-specific kernels. The mapping can specify different kernels for different modes (training, inference, etc.).
inherit_mapping (bool, optional, defaults to True) — When True, the current mapping will be extended by mapping. When False, the existing mappings are erased before adding mapping.

This function allows you to register a mapping between a layer name and the corresponding kernel(s) to use, depending on the device and mode. This should be used in conjunction with kernelize().

Example:

from kernels import LayerRepository, register_kernel_mapping, Mode

# Simple mapping for a single kernel per device
kernel_layer_mapping = {
    "LlamaRMSNorm": {
        "cuda": LayerRepository(
            repo_id="kernels-community/activation",
            layer_name="RmsNorm",
            revision="layers",
        ),
    },
}
register_kernel_mapping(kernel_layer_mapping)

# Advanced mapping with mode-specific kernels
advanced_mapping = {
    "MultiHeadAttention": {
        "cuda": {
            Mode.TRAINING: LayerRepository(
                repo_id="username/training-kernels",
                layer_name="TrainingAttention"
            ),
            Mode.INFERENCE: LayerRepository(
                repo_id="username/inference-kernels",
                layer_name="FastAttention"
            ),
        }
    }
}
register_kernel_mapping(advanced_mapping)

Kernelizing a model

kernelize

kernels.kernelize

< source >

( model: 'nn.Module' mode: Mode device: Optional[Union[str, 'torch.device']] = None use_fallback: bool = True ) → nn.Module

Parameters

model (nn.Module) — The PyTorch model to kernelize.
mode (Mode) — The mode that the kernel is going to be used in. For example, Mode.TRAINING | Mode.TORCH_COMPILE kernelizes the model for training with torch.compile.
device (Union[str, torch.device], optional) — The device type to load kernels for. Supported device types are: “cuda”, “mps”, “npu”, “rocm”, “xpu”. The device type will be inferred from the model parameters when not provided.
use_fallback (bool, optional, defaults to True) — Whether to use the original forward method of modules when no compatible kernel could be found. If set to False, an exception will be raised in such cases.

Returns

nn.Module

The kernelized model with optimized kernel implementations.

Replace layer forward methods with optimized kernel implementations.

This function iterates over all modules in the model and replaces the forward method of extensible layers for which kernels are registered using register_kernel_mapping() or use_kernel_mapping().

Example:

import torch
import torch.nn as nn

from kernels import kernelize, Mode, register_kernel_mapping, LayerRepository
from kernels import use_kernel_forward_from_hub

@use_kernel_forward_from_hub("SiluAndMul")
class SiluAndMul(nn.Module):
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        d = x.shape[-1] // 2
        return F.silu(x[..., :d]) * x[..., d:]

mapping = {
    "SiluAndMul": {
        "cuda": LayerRepository(
            repo_id="kernels-community/activation",
            layer_name="SiluAndMul",
        )
    }
}
register_kernel_mapping(mapping)

# Create and kernelize a model
model = nn.Sequential(
    nn.Linear(1024, 2048, device="cuda"),
    SiluAndMul(),
)

# Kernelize for inference
kernelized_model = kernelize(model, mode=Mode.TRAINING | Mode.TORCH_COMPILE)

Classes

Device

class kernels.Device

< source >

( type: str properties: Optional[CUDAProperties] = None )

Parameters

type (str) — The device type (e.g., “cuda”, “mps”, “npu”, “rocm”, “xpu”).
properties (CUDAProperties, optional) — Device-specific properties. Currently only CUDAProperties is supported for CUDA devices.

Represents a compute device with optional properties.

This class encapsulates device information including device type and optional device-specific properties like CUDA capabilities.

Example:

from kernels import Device, CUDAProperties

# Basic CUDA device
cuda_device = Device(type="cuda")

# CUDA device with specific capability requirements
cuda_device_with_props = Device(
    type="cuda",
    properties=CUDAProperties(min_capability=75, max_capability=90)
)

# MPS device for Apple Silicon
mps_device = Device(type="mps")

# XPU device (e.g., Intel(R) Data Center GPU Max 1550)
xpu_device = Device(type="xpu")

# NPU device (Huawei Ascend)
npu_device = Device(type="npu")

create_repo

< source >

( )

Create an appropriate repository set for this device type.

Mode

class kernels.Mode

< source >

( value names = None module = None qualname = None type = None start = 1 )

Parameters

INFERENCE — The kernel is used for inference.
TRAINING — The kernel is used for training.
TORCH_COMPILE — The kernel is used with torch.compile.
FALLBACK — In a kernel mapping, this kernel is used when no other mode matches.

Kernelize mode

The Mode flag is used by kernelize() to select kernels for the given mode. Mappings can be registered for specific modes.

Note: Different modes can be combined. For instance, INFERENCE | TORCH_COMPILE should be used for layers that are used for inference with torch.compile.

LayerRepository

class kernels.LayerRepository

< source >

( repo_id: str layer_name: str revision: Optional[str] = None version: Optional[str] = None )

Parameters

repo_id (str) — The Hub repository containing the layer.
layer_name (str) — The name of the layer within the kernel repository.
revision (str, optional, defaults to "main") — The specific revision (branch, tag, or commit) to download. Cannot be used together with version.
version (str, optional) — The kernel version to download. This can be a Python version specifier, such as ">=1.0.0,<2.0.0". Cannot be used together with revision.

Repository and name of a layer for kernel mapping.

Example:

from kernels import LayerRepository

# Reference a specific layer by revision
layer_repo = LayerRepository(
    repo_id="kernels-community/activation",
    layer_name="SiluAndMul",
)

# Reference a layer by version constraint
layer_repo_versioned = LayerRepository(
    repo_id="kernels-community/activation",
    layer_name="SiluAndMul",
    version=">=0.0.3,<0.1"
)

LocalLayerRepository

class kernels.LocalLayerRepository

< source >

( repo_path: Path package_name: str layer_name: str )

Parameters

repo_path (Path) — The local repository containing the layer.
package_name (str) — Package name of the kernel.
layer_name (str) — The name of the layer within the kernel repository.

Repository from a local directory for kernel mapping.

Example:

from pathlib import Path

from kernels import LocalLayerRepository

# Reference a specific layer by revision
layer_repo = LocalLayerRepository(
    repo_path=Path("/home/daniel/kernels/activation"),
    package_name="activation",
    layer_name="SiluAndMul",
)

LockedLayerRepository

class kernels.LockedLayerRepository

< source >

( repo_id: str lockfile: Optional[Path] = None layer_name: str )

Repository and name of a layer.

In contrast to LayerRepository, this class uses repositories that are locked inside a project.

Update on GitHub

Kernels

Layers API Reference

Making layers kernel-aware

use_kernel_forward_from_hub

kernels.use_kernel_forward_from_hub

replace_kernel_forward_from_hub

kernels.replace_kernel_forward_from_hub

Registering kernel mappings

use_kernel_mapping

kernels.use_kernel_mapping

register_kernel_mapping

kernels.register_kernel_mapping

Kernelizing a model

kernelize

kernels.kernelize

Classes

Device

class kernels.Device

create_repo

Mode

class kernels.Mode

LayerRepository

class kernels.LayerRepository

LocalLayerRepository

class kernels.LocalLayerRepository

LockedLayerRepository

class kernels.LockedLayerRepository