conv2d-neuron-kernels

A NKI (Neuron Kernel Interface) conv2d kernel for AWS Trainium / Inferentia, packaged for the HuggingFace kernels library + the KernelConfig API.

It replaces torch.nn.Conv2d with an implicit-GEMM NKI implementation that runs on the NeuronCore Tensor Engine.

Build variant

build/torch-neuron/ — pure-Python NKI kernel (compiled by neuronx-cc at load time). Requires the Neuron SDK (nki) to be installed in the runtime.

Capabilities

Arbitrary stride (sH, sW)
Symmetric / asymmetric padding (pH, pW)
Non-square kernels (R x S), 1x1, 3x3, 5x5, ...
Optional bias
bf16 and fp32

Constraints: stride >= 1, dilation = 1, groups = 1, padded plane Hp*Wp <= 32767 (single-tile). Correctness validated against torch.nn.functional.conv2d (cosine = 1.0; fp32 max-abs ~1e-5).

Usage

from transformers import AutoModelForCausalLM, KernelConfig  # or any model with nn.Conv2d

kernel_config = KernelConfig({"Conv2d": "<owner>/conv2d-neuron-kernels:NeuronConv2d"})
model = AutoModelForCausalLM.from_pretrained(
    "<model-id>",
    use_kernels=True,
    kernel_config=kernel_config,
)

Conv2d (the key) is the original module class name that gets replaced. NeuronConv2d (the value) is the KernelName; the repo also provides the companion NeuronConv2dLayout that holds parameters and declares the [Cout,Cin,R,S] -> [Cin,R,S,Cout] weight relayout via conversion_mapping.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support