YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Tempo-SNN v2: Complexity-Aware Physics-Based RL PIM Controller
A fully refactored, literature-grounded routing framework for heterogeneous computing (PIM/ReRAM, CPU, GPU) with RL-based task scheduling and polyhedral compilation-aware complexity profiling.
Structure
.
βββ requirements.txt
βββ README.md
βββ src/
β βββ __init__.py
β βββ profiler.py # Task complexity profiler (M-1..M-5 fixes)
β βββ physics.py # ReRAM/STT-MRAM physics model (RC thermal, Arrhenius)
β βββ rl_env.py # RL environment (state normalization, retention)
β βββ rl_agent.py # Dueling DQN + Noisy Nets + SAC variant
β βββ controller.py # Seamless PIM/CPU/GPU controller
β βββ router_static.py # Universal parser + hybrid decision tree
β βββ polyhedral.py # Polyhedral AI estimator + aware router
β βββ baselines.py # READYS, EdgeSched-DQN, threshold baselines
β βββ training.py # Training loop + sample efficiency metrics
β βββ plots.py # Monitoring + interpretability plots
β βββ benchmarks/
β βββ mlperf_tiny.py # MLPerf Tiny model stubs + harness
βββ tests/
β βββ test_profiler.py # 62 comprehensive tests
βββ scripts/
β βββ train.py # CLI entry point
βββ test_minimal.py # Quick smoke test
Critical Bug Fixes
| Bug | Location | Fix |
|---|---|---|
| B-1: DemoSNN dimension mismatch | DemoSNN.__init__ |
FC input corrected for 16Γ16βpoolβ8Γ8 |
| B-2: Profiler masks all errors | TaskComplexityProfiler._analyze_layers |
Warning on failure + targeted shape fallback |
B-3: Broken boolean in PolyhedralAwareRouter.route |
router_static.py |
Explicit None guards, proper fallback to COMPLEXITY_LIBRARY[0] |
B-4: SNN init_hidden=True + manual init_leaky() |
Forward pass | Removed in DemoSNN; init_hidden handles state internally |
| B-5: LR scheduler not saved/loaded | Agent.save/load |
Added scheduler.state_dict() to checkpoint |
Modeling Fixes (M-1..M-5)
| Issue | Fix |
|---|---|
M-1: Activation memory = max(single tensor) |
ActivationMemoryTracker with live-range peak summation |
| M-2: SNN FLOPs ignore neuronal dynamics | lif_flops = num_lif_neurons * timesteps * 4 added per-layer |
| M-3: Alias collisions silent overwrite | ValueError on duplicate alias during library build |
M-4: timesteps normalized inconsistently |
Single MAX_TIMESTEPS_REF = 100 used everywhere |
| M-5: PIM always applies sparsity skip | pim_supports_sparse flag; only skips if hardware supports it |
Performance Optimizations (O-1..O-5)
| Opt | Implementation |
|---|---|
| O-1: CPUβGPU tensor transfer | PrioritizedReplayBuffer stores on CPU, batches directly to device tensors |
| O-2: Profile caching | _profile_cache keyed by (id(model), input_shape, timesteps) |
| O-3: CosineAnnealingLR | Replaces brittle StepLR; decays over full training horizon |
| O-4: RunningMeanStd | NormalizationStats Welford-style online normalization (OpenAI Baselines) |
| O-5: N-step returns (3-step) | NStepBuffer with discounted multi-step returns in store_transition |
Literature-Grounded Additions
Category 1 β Polyhedral Compilation
PolyhedralAIEstimatorwith loop fusion and cache tiling models (PolyMage/Pluto-inspired)PolyhedralAwareRoutercomputes post-compile AI and may change routing decision
Category 2 β STT-MRAM Thermal Reliability
- RC thermal network (2-node: junction + case) with Zhang et al. parameters
- Arrhenius retention time
Ο = ΟβΒ·exp(Ea_ret/kT)β emergency migration ifΟ < 1ms - Temperature-dependent endurance
N_end(T) = NβΒ·exp(βEa_end/k(1/T β 1/T_ref)) - Read disturb tracking
Ξ²(T)per read cycle
Category 3 β Thermal-Aware Scheduling
- Retention failure penalty in reward:
-1.0if retention < inference duration - DVFS-style frequency scaling placeholder in SAC variant
Category 4 β RL-Based Scheduling
- Noisy Nets (
NoisyLinear) for parametric exploration (Fortunato et al. 2017) - 3-step n-step returns for sample efficiency
- Double Q-learning in all DQN variants (target net + policy net argmax)
- SAC agent variant for continuous action space ablation
- READYS baseline (deadline slack / execution time greedy)
- EdgeSched-DQN flat baseline (no dueling/PER/hierarchy)
Category 5 β MLPerf Tiny Benchmarking
- DS-CNN (Keyword Spotting), MobileNetV1 (Visual Wake Words)
- FC Autoencoder (Anomaly Detection), ResNet-like (Image Classification)
PIMAccuracyModeldegrades accuracy by fault density Γ V_th deviation Γ temperature- Harness runs 5β100 inferences, reports accuracy/latency/energy per task
Test Results
62 tests covering profiler, physics, RL env/agent, controller, router, baselines, MLPerf Tiny, and edge cases. All pass in <5s.
Run:
cd tempo-snn-v2
python test_minimal.py # 6 core checks (<1s)
python tests/test_profiler.py # 62 tests (physics + profiler fast; RL ~120s)
Usage
from router_static import ComplexityRouter, HardwareState
router = ComplexityRouter()
report = router.route("FFT", hw=HardwareState.from_temperature(T=45.0))
# report.target -> "GPU", report.tier_used -> "TIER2_SKLEARN"
References
Key citations baked into the code:
- Pluto (Bondhugula et al., PLDI 2008): affine scheduling baseline
- PolyMage (Mullapudi et al., ASPLOS 2015): polyhedral image pipeline optimization
- Zhang et al. (IEEE Trans. Nanotech 2018): STT-MRAM compact thermal model
- Mnih et al. (Nature 2015): DQN foundation
- Wang et al. (ICML 2016): Dueling network architectures
- Fortunato et al. (2017): Noisy Networks for exploration
- Banbury et al. (arXiv 2021): MLPerf Tiny Benchmark
- Grinsztajn et al. (IEEE Cluster 2021): READYS heterogeneous scheduling
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support