Finding the Time to Think in Real-Time RL β Checkpoints
Pretrained base planners and gating policies for the paper Finding the Time to Think in Real-Time RL.
A lightweight gating policy on top of a frozen AlphaZero-style MCTS planner selects a state-dependent planning budget at each decision point, across five real-time games (Pac-Man, real-time Tetris, Snake, Speed Hex, Speed Go).
- π Project page: https://aneeshers.github.io/realtime-rl/
- π Paper (PDF): https://aneeshers.github.io/realtime-rl/assets/finding-the-time-to-think.pdf
- π» Code: https://github.com/Aneeshers/realtime-rl-code
Layout
checkpoints/
βββ clock/{go,hex}/{base,gating} # Speed Go / Speed Hex (pgx)
βββ committed_action/{pacman,snake,tetris_rt}/{base,gating} # Jumanji
One AlphaZero base planner + one PPO gating policy per environment. See the code repo's README for the launcher scripts that consume these.