LunarLander โ€” 836 params, 100% landings, 0% crashes

A tiny neural network (836 params) that perfectly solves LunarLander-v2. 48ร— smaller than PPO (40K params).

Performance

Metric Value
Avg reward 257.7
Full landings 100/100 (100%)
Crashes 0/100 (0%)
Params 836
Training data 500 episodes (154K samples)
Training epochs 100

Demo

Architecture

Params: 8ร—64 + 64 + 64ร—4 + 4 = 836

Usage

Training

Distilled from PPO (stable-baselines3) via supervised learning on 500 expert episodes.

Downloads last month
187
Video Preview
loading