governed-inference-meter

Energy-metered, governed inference receipts. A lightweight, dependency-light Python utility (and Hugging Face universal kernel) that wraps any inference call and emits a governed, energy-metered, tamper-evident receipt:

measures GPU energy via NVIDIA NVML (power/energy readback) integrated over wall-time → joules,
computes tokens-per-joule,
runs a pluggable, advisory policy gate (allow/deny; defaults to allow),
and emits a SHA-256 hash-chained JSON receipt so a sequence of calls is independently auditable.

It is the energy + governance counterpart to SZLHOLDINGS/szl-governed-norm — provenance at the inference boundary, in the spirit of the a11oy governed-AI platform: receipts, not capability claims.

Why this exists. Browse the Kernel Hub and you find performance kernels — attention, activations, GEMM, norms. There is no energy-metering + governance kernel. Teams running inference in sovereign, regulated, or cost/carbon-sensitive contexts measure tokens/joule and keep audit trails by hand. This utility does both in one wrapped call, and degrades honestly when no GPU energy readback is available.

Honest scope (read this first)

This project follows a strict honesty doctrine. Λ (the governance trust quantity) is Conjecture 1 — advisory, not a theorem. Trust is never 100%.

MEASURED only with NVML. Energy is real only when NVML is present and grants power/energy readback. Without it the receipt is labeled mode="unmeasured" and joules / tokens_per_joule are null. We never fabricate a joule figure.
Board-level power. NVML reports whole-board power (compute die + memory + losses). We report what the hardware reports and say so. No modeling, no scaling factors.
The policy gate is advisory and host-enforced. It records an allow/deny decision into the receipt. It does not, and cannot, enforce anything by itself — your host must actually skip a denied call. The bundled meter() wrapper does fail-safe (it will not execute a denied call), but downstream enforcement is still your responsibility.
The receipt digest is an integrity fingerprint, not a signature. It is a SHA-256 over the canonical record body and makes tampering evident. It does not prove authorship. Cryptographic signing (e.g. DSSE/Sigstore) is a separate, out-of-band concern, intentionally not done here.
This is a metering + receipt utility, not a safety guarantee.

Install / load

From the Hugging Face Hub (universal kernel — runs on CPU and CUDA):

from kernels import get_kernel
gim = get_kernel("SZLHOLDINGS/governed-inference-meter")

From PyPI-style source (zero hard dependencies; add pynvml for real energy):

pip install kernels            # to load via get_kernel
# real GPU energy measurement additionally needs NVML bindings:
pip install pynvml

Usage

from kernels import get_kernel
gim = get_kernel("SZLHOLDINGS/governed-inference-meter")

print(gim.__version__)
print(gim.capability_report())   # what energy measurement is possible here

# Wrap ANY inference callable. You tell the meter the token counts.
def run(prompt):
    # ... your real model.generate(...) call here ...
    return "the model's response text"

receipt, output = gim.meter(
    run, args=("hello",),
    model="my-llm-7b",
    tokens_in=2, tokens_out=7,
)

print(receipt["mode"])             # 'measured-energy' | 'measured-power-integral' | 'unmeasured'
print(receipt["joules"])           # float, or None when unmeasured
print(receipt["tokens_per_joule"]) # float, or None when unmeasured
print(receipt["policy_decision"])  # 'allow' | 'deny'
print(receipt["digest"])           # SHA-256 over the canonical record body
print(gim.receipt_verify())        # (ok, depth, first_break_seq) over the chain

A custom policy gate (advisory)

def my_gate(ctx):
    # ctx has model, tokens_in, tokens_out, args, kwargs, ts
    if ctx["tokens_in"] > 8192:
        return ("deny", "prompt exceeds governed token budget")
    return ("allow", "within budget")

receipt, output = gim.meter(run, args=("hi",), model="m",
                            tokens_in=2, tokens_out=7, policy=my_gate)

A gate may return a PolicyResult, a (decision, reason) tuple, a bool, or a string. It runs fail-closed: if your gate raises, the call is denied with the exception text as the reason — a buggy policy can never silently allow.

Per-request chain (no global-state contention)

chain = gim.ReceiptChain()
gim.meter(run, args=("a",), model="m", tokens_in=1, tokens_out=4, chain=chain)
gim.meter(run, args=("b",), model="m", tokens_in=1, tokens_out=6, chain=chain)
print(chain.verify())              # tamper-evident over YOUR chain only
print(chain.to_jsonl())            # export the chain for offline audit

MEASURED vs. `unmeasured` — what you get

Environment	`mode`	`joules`	`tokens_per_joule`
NVIDIA GPU with energy counter (`nvmlDeviceGetTotalEnergyConsumption`)	`measured-energy`	hardware accumulator delta	computed
NVIDIA GPU, power readback only (`nvmlDeviceGetPowerUsage`)	`measured-power-integral`	trapezoidal integral of power samples	computed
No GPU / no driver / no permission / no `pynvml`	`unmeasured`	`null`	`null`

Sample receipt — `unmeasured` (illustrative; this build env has no GPU)

SAMPLE / illustrative. Produced on a CPU-only box. Because NVML is unavailable, energy is honestly unmeasured and joules is null — exactly the honest-degrade behavior. No energy number is invented.

{
  "seq": 0,
  "model": "my-llm-7b",
  "tokens_in": 2,
  "tokens_out": 7,
  "mode": "unmeasured",
  "joules": null,
  "wall_seconds": 0.004182,
  "tokens_per_joule": null,
  "policy_decision": "allow",
  "policy_reason": "default allow_all gate (no policy configured)",
  "prev": "0000000000000000000000000000000000000000000000000000000000000000",
  "digest": "<sha256 of the canonical body>",
  "ts": 1750000000.0
}

On a real NVIDIA GPU the same call would carry e.g. "mode": "measured-energy", a positive "joules", and a computed "tokens_per_joule". We do not print example GPU numbers here because this build environment cannot measure them, and inventing them would violate the honesty doctrine. Run gim.selfcheck() on your own hardware to see your numbers.

Self-test

import governed_inference_meter as gim
print(gim.selfcheck())   # functional check (NOT a benchmark); no fabricated energy

selfcheck() runs a metered allow call, a denied call (verifying it does not execute), checks tokens/joule honesty, verifies the hash chain, and confirms that mutating a past record is detected. It requires no GPU.

What's in the repo

build.toml                                  # Kernel Hub universal-kernel manifest
build/torch-universal/governed_inference_meter/
  __init__.py     # meter() / metered() wrappers, selfcheck(), accessors
  _energy.py      # NVML energy + power-integral measurement, honest degrade
  _receipt.py     # SHA-256 hash-chained, tamper-evident receipts
  _policy.py      # advisory policy gate (allow_all default, fail-closed)
  metadata.json
pyproject.toml                              # also pip-installable from source
tests/test_meter.py                         # runs on CPU, no GPU needed
LICENSE                                     # Apache-2.0

Doctrine & honesty disclaimer

SZL Holdings · governed, energy-metered inference receipts · MEASURED only with NVML · the policy gate is advisory (host-enforced) · Λ = Conjecture 1 (advisory, not a theorem) · trust never 100% · honesty over checklist. This is a metering + receipt utility, not a safety guarantee. No fabricated benchmarks; energy is reported only when physically measured.

License: Apache-2.0 · Maintainer: Stephen Lutar stephenlutar2@gmail.com · Platform: a11oy.net — the governed-inference substrate for hard missions.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

SZLHOLDINGS
/

governed-inference-meter

governed-inference-meter

Honest scope (read this first)

Install / load

Usage

A custom policy gate (advisory)

Per-request chain (no global-state contention)

MEASURED vs. `unmeasured` — what you get

Sample receipt — `unmeasured` (illustrative; this build env has no GPU)

Self-test

What's in the repo

Doctrine & honesty disclaimer

Space using SZLHOLDINGS/governed-inference-meter 1

governed-inference-meter

Honest scope (read this first)

Install / load

Usage

A custom policy gate (advisory)

Per-request chain (no global-state contention)

MEASURED vs. unmeasured — what you get

Sample receipt — unmeasured (illustrative; this build env has no GPU)

Self-test

What's in the repo

Doctrine & honesty disclaimer

Space using SZLHOLDINGS/governed-inference-meter 1

MEASURED vs. `unmeasured` — what you get

Sample receipt — `unmeasured` (illustrative; this build env has no GPU)