Kernels
kernel

szl-governed-norm

The first governed kernel on the Hugging Face Kernel Hub. Correctness-verified RMSNorm & LayerNorm with optional governance receipts that make every call auditable at the kernel layer. (v0.2.0)

Most Kernel Hub kernels compete on raw speed. szl-governed-norm opens a different axis: verifiable provenance. Same clean get_kernel one-liner, plus a SHA3-256 hash-chained audit trail no other kernel ships.

A universal (pure-PyTorch) normalization kernel from SZL Holdings. It gives you a trustworthy reference implementation of RMSNorm and LayerNorm that runs on CPU and CUDA and plays nicely with torch.compile โ€” plus an opt-in governed mode that emits content-addressed, SHA3-256 hash-chained receipts of each normalization call.


What it is

szl-governed-norm is a Kernel Hub kernel built for two things people actually need from a normalization layer:

  1. A correctness reference you can trust. RMSNorm and LayerNorm are implemented in pure PyTorch, computed in float32 for numerical stability and cast back to the input dtype (the standard Llama-style convention). They are verified against PyTorch's own references in the test suite.
  2. Provenance you can verify. Run any call with governed=True and the kernel records a small, deterministic receipt โ€” input shape/dtype, eps, and a SHA3-256 digest of the (rounded) output โ€” hash-chained to the previous receipt. The result is an independently re-walkable audit trail for a sequence of kernel calls.

This is a universal kernel: it ships no hand-tuned CUDA/Triton binary. Its differentiator is verifiable governance, not raw FLOPs.


Quickstart

import torch
from kernels import get_kernel

# Current `kernels` (>=0.15) requires an explicit revision/version + trust flag for org kernels:
gn = get_kernel("SZLHOLDINGS/szl-governed-norm", revision="main", trust_remote_code=True)
# (once a tag is published you can pin it, e.g. revision="v0.2.0")

print(gn.__version__)        # "0.2.0"
print(gn.selfcheck())        # one-shot correctness + receipt verification

x = torch.randn(4, 1024, dtype=torch.float16, device="cuda")
w = torch.ones(1024, dtype=torch.float16, device="cuda")

# Plain path โ€” drop-in normalization.
y = gn.rms_norm(x, weight=w, eps=1e-6)
z = gn.layer_norm(x, weight=w, eps=1e-5)

Governed mode + receipts

# Same math, plus an audit receipt.
y = gn.rms_norm(x, weight=w, eps=1e-6, governed=True)

print(gn.receipt_head())     # SHA3-256 head over all governed calls
print(gn.receipt_verify())   # {'ok': True, 'depth': 1, 'first_break_seq': -1, 'head': '...'}

# Per-call chain (no global state โ€” ideal for concurrent threads/requests):
chain = gn.ReceiptChain()
y = gn.rms_norm(x, weight=w, eps=1e-6, chain=chain)
print(chain.verify())        # (ok, depth, first_break_seq)

Governance is strictly opt-in: with governed=False (the default) nothing is recorded, and the kernel never writes to disk or the network.


API reference

Functional API

Function Signature Notes
rms_norm rms_norm(x, weight=None, eps=1e-6, governed=False, chain=None) RMSNorm over the last dim. Emits a receipt when governed=True or a chain is passed.
layer_norm layer_norm(x, weight=None, bias=None, eps=1e-5, governed=False, chain=None) LayerNorm over the last dim.
fused_add_rms_norm fused_add_rms_norm(x, residual, weight=None, eps=1e-6, governed=False, chain=None) Residual-add + RMSNorm (pre-norm transformer block). Returns (y, new_residual).
selfcheck selfcheck() One-shot correctness + governance check; returns a JSON-able dict, never raises.

All compute in float32 and cast back to the input dtype. rms_norm matches a Llama-style RMSNorm reference; layer_norm matches torch.nn.functional.layer_norm for the last-dim case (verified in tests/, 165 passing).

Governance receipt API

Function Returns Description
receipt_head() str SHA3-256 head of the default receipt chain ("0"*64 if empty).
receipt_count() int Number of governed calls recorded on the default chain.
receipt_tail(n=10) list[dict] The last n receipts.
receipt_verify() dict Re-walks the chain; returns {ok, depth, first_break_seq, head}.
ReceiptChain class Construct your own isolated chain (emit, head, count, tail, verify).

nn.Module layers (for the kernels layer-mapping mechanism)

Pure torch.nn.Module subclasses (only forward, no custom __init__, no class variables) so they drop in over an existing module:

Layer Reads from host module
RMSNorm self.weight (optional), self.variance_epsilon or self.eps
LayerNorm self.weight/self.bias (optional), self.eps
FusedAddRMSNorm self.weight (optional), self.variance_epsilon or self.eps

Governed mode โ€” provenance at the kernel layer

When a call runs in governed mode, the kernel builds a receipt body, takes a SHA3-256 digest over its canonical JSON, and links each receipt to the previous one via a prev field โ€” a classic hash chain:

{
  "seq": 0, "op": "rms_norm", "in_shape": [4, 1024], "in_dtype": "float16",
  "eps": 1e-06, "out_digest": "<sha3-256 of the rounded output>", "prev": "<prev digest or 64 zeros>"
}

receipt_verify() re-walks the chain and reports the first break, so tampering with any receipt invalidates everything downstream. This is the same provenance doctrine SZL Holdings applies across its a11oy governed-AI platform โ€” applied here at the lowest layer of the stack, the kernel itself.


Correctness & honesty

  • Universal, pure-Python kernel โ€” a correctness reference, verified against PyTorch's own references (165 passing tests).
  • Runs on CPU and CUDA, torch.compile(fullgraph=True)-compatible. Under compile, governed numerics are unchanged but receipt emission (an eager byte-hashing side effect) is skipped โ€” govern at the eager audit boundary.
  • No fabricated benchmarks. This is not a hand-tuned CUDA/Triton binary; we make no speedup claims.
  • The receipt digest is an integrity fingerprint, NOT a cryptographic signature. It proves a receipt sequence is internally consistent and untampered โ€” not authorship. DSSE signing is a separate, out-of-band concern.
  • Governance is opt-in and side-effect-free by default.

Compatibility

Requirement Version
Python 3.9+
PyTorch torch>=2.5
Dependencies Python standard library + torch only

See it live

This kernel has a 3D holographic showcase Space โ€” the lattice is bound to the 165 passing tests, and the governance receipts are demonstrated interactively:

ฮ› governance is advisory (Conjecture 1, uniqueness OPEN) โ€” never "proven trust." Honest BLOCKED beats fake green.

About SZL Holdings

SZL Holdings, founded by Stephen Lutar, builds governed-AI infrastructure โ€” provenance, observability, and security tooling for AI systems. Its work includes the a11oy governed-AI platform and killinchu, 36 public repositories and a large public dataset corpus on the SZL Holdings Hugging Face org, and research published on Zenodo. This kernel applies that same governance doctrine at the level of a single PyTorch operation.

License

Apache-2.0. Copyright 2026 SZL Holdings.


SZL Holdings ยท governed normalization ยท provenance at the kernel layer ยท a11oy.net ยท github.com/szl-holdings ยท huggingface.co/SZLHOLDINGS

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ 1 Ask for provider support

Spaces using SZLHOLDINGS/szl-governed-norm 3