bit-vector-tensor-control-policy / docs /adding_benchmarks_v0.md
J94's picture
Initial Space upload
3436bdd verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Adding Benchmarks v0

Clean product one-liner: benchmark features belong in the shipped product when they directly tune or prove the control loop.

Layman version: if a benchmark changes how the harness steers, it is on-product, not just lab work.

Is this on product?

Yes, when the benchmark does one of these:

Benchmark feature On product? Why
lane-selection check yes it tunes runtime control
memory / freshness check yes it tunes graph-backed memory
tensor posture check yes it tunes the control language
receipt coverage check yes it governs trust
random research probe with no routing consequence no keep it in research

How to add a benchmark feature

  1. Add a focused runner under benchmarks/.
  2. Write outputs under runs/benchmark/<name>-<timestamp>/.
  3. Emit:
    • summary.json
    • report.md
  4. Add the benchmark to the README if it informs product behavior.
  5. If it changes steering, connect its result back into:
    • policy/
    • benchmarks/control_scorecard_v0.*

Minimal output contract

File Purpose
summary.json machine-readable metric surface
report.md human-readable benchmark brief

Current shipped benchmark

Use:

./bin/bvtctl benchmark

Why: this proves the standalone repo can benchmark its own graph-first and execution lanes through the local CLI.