LunarLander โ 836 params, 100% landings, 0% crashes
A tiny neural network (836 params) that perfectly solves LunarLander-v2. 48ร smaller than PPO (40K params).
Performance
| Metric | Value |
|---|---|
| Avg reward | 257.7 |
| Full landings | 100/100 (100%) |
| Crashes | 0/100 (0%) |
| Params | 836 |
| Training data | 500 episodes (154K samples) |
| Training epochs | 100 |
Demo
Architecture
Params: 8ร64 + 64 + 64ร4 + 4 = 836
Usage
Training
Distilled from PPO (stable-baselines3) via supervised learning on 500 expert episodes.
- Downloads last month
- 187