YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
rob-rbyte-v2
Residue router for the SAIR Modular Arithmetic Challenge. Entry class
model.ResidueRouterV1, output base 256. Covers tiers 1-3.
Routing is by the size of p. Operands are reduced mod p inside
predict_digits (the two-argument normalization both reference models use:
a with p, then b with p, never all three).
Tiers 1-2 (p <= 251): the v1 residue specialist. Each operand residue is embedded through a shared per-(prime, residue) table; the two vectors are added (a discrete-log inductive bias: logs add under multiplication); a residual MLP trunk transforms the sum; logits score against a per-(prime, class) output table masked to the p classes of the current prime. The answer is one base-256 digit. ~2.9M parameters.
Tier 3 (251 < p < 65536): two trained shared local-rule step nets composed through fixed wiring. After reduction the operands x, y are 16-bit residues. A MULTIPLY step learns the shared carry rule over the carry-save column sums and, composed closed-loop through a fixed parity readout, emits the exact 32-bit product t = x*y. A REDUCTION step learns the shared per-nibble borrow/compare rule and, composed through fixed restoring-division wiring, emits r = t mod p in [0, p). The answer r is emitted as base-256 digits MSB-first (two digits cover a 16-bit residue). Both step nets are plain GELU MLPs, width 96, depth 3,
20k parameters each (40k total).Tiers 4-10 (p >= 65536): outside the trained regime; returns [0].
Provenance
The carry-save column sums, parity readout, bit shifts, restoring-division topology, and ge-from-final-borrow decision are fixed scaffold. The two nontrivial decisions, the carry rule and the borrow/compare rule, reside in the trained MLP parameters. Randomizing either step net collapses tier-3 exactness:
- random-weight pipeline (both step nets re-initialized): exact = 0.000000
- trained multiply + random reduction: exact = 0.002196 (chance)
so neither step net is scaffolding. The full collapse receipt is in
t3_collapse_receipt.json. The two MULTIPLY/REDUCTION step nets are trained
teacher-forced on the local-rule transitions of reference traces; the MULTIPLY
step is saturated over its realizable 272-case domain (100 realizable cases)
and never sees p, the REDUCTION step covers all 512 cases from traces over
TRAIN primes only. Five primes near the 16-bit ceiling (33343, 45137, 54497,
55061, 62071) are held out by identity and appear in no training trace; the
composed pipeline is exact (1.0) on all five on uniform residue pairs and the
four edge cases.
Public benchmark (1100 problems, fixed seed)
- overall_accuracy = 0.314
- highest_tier_above_90 = 3
- deterministic = True (two full runs bit-identical)
- tier 1 = 1.000, tier 2 = 1.000, tier 3 = 1.000
- inference wall-clock < 0.1s for 1100 problems (300s budget)
Static check: clean. No sympy / gmpy2 / eval / exec / subprocess on any path.
See EVALS.log and eval_6d6f6463_1100.json for the full per-tier breakdown,
and manifest.json for the model and training descriptions.
Files
model.py (architectures + routing + fixed wiring), weights.safetensors
(tier-1/2 specialist), t3_mul.safetensors / t3_red.safetensors (tier-3
step nets), config.json (per-specialist hyperparameters), manifest.json,
t3_collapse_receipt.json, EVALS.log, eval_6d6f6463_1100.json.
- Downloads last month
- 24