YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
bitserial-modmul โ learned modular multiplication
Submission for the SAIR Modular Arithmetic Challenge. One shared, p-conditioned
recurrent cell applied in a fixed bit-serial (Horner) loop computes (a * b) mod p.
The cell learns the per-step transition s' = (2*s + d*x) mod p, including the modular
wrap; the loop only sequences bits. Answers are emitted as base-2 digits and the
harness decoder reconstructs the integer.
Local evaluation
modchallenge evaluate, 1100 problems, secret-seed unseen primes (tiers 2+):
- Tiers 1-7: exact-match 1.00 each.
- highest_tier_above_90 = 7, overall_accuracy = 0.706.
- Identical on two independent seeds.
Provenance
The capability is in the trained parameters: randomizing the weights collapses every solved tier to 0.00 (overall 0.006). No symbolic-math libraries, no big-integer modular arithmetic in Python, no lookup tables. The reduction and the multiplication are performed by the trained cell; the Python loop performs no arithmetic (only bit indexing and feeding the cell). The static-analysis check passes. Each preprocessing hook sees only its own argument.
The cell is a ~471K-parameter bidirectional GRU. It was trained from random
initialization on one-step modular transitions (modulus bit-length stratified,
wrap-boundary cases oversampled), with lr warmup + cosine decay. A single L=256 cell
covers tiers 1-7. See manifest.json for the full architecture and training summary.
Interface
entry_class:model.BitSerialReduceroutput_base: 2- Files:
model.py,manifest.json,weights.pt.
Limitation (honest)
This model passes the random-operand benchmark but is not exact. On structured inputs (powers of two and other long doubling chains) the per-step reduction drifts for some primes beyond about 500 steps, reproducing the Neural GPU limitation (Price, Zaremba, Sutskever 2016). The benchmark tiers reflect average-case accuracy on the official scorer's random-operand distribution, not worst-case exactness of the underlying modular-multiplication operator.