Dreamer V4 (from-scratch) β checkpoints
Inference checkpoints for a from-scratch PyTorch reproduction of DreamerV4 (Hafner, Yan & Lillicrap, 2025; arXiv:2509.24527): tokenizer β flow-matching world model β behavior-cloned agent β imagination RL (PMPO). Current checkpoints are trained end-to-end on ball_in_cup_catch (more tasks to follow).
Code: https://github.com/vijayabhaskar-ev/dreamer_v4
Checkpoints (optimizer state stripped β inference only)
| file | what it is | size |
|---|---|---|
ball_in_cup/tokenizer.pt |
masked-autoencoder tokenizer (128Γ128) | 300 MB |
ball_in_cup/agent_bc.pt |
BC agent β world model + categorical policy + reward/continue heads | 507 MB |
ball_in_cup/agent_imagination_rl.pt |
imagination-RL policy + value heads (loads on top of agent_bc) |
7 MB |
ball_in_cup/world_model.pt |
world-model base, before agent finetuning β optional, only to retrain the agent | 491 MB |
Minimum set to reproduce the eval: tokenizer + agent_bc + agent_imagination_rl (~814 MB). Checkpoints for future tasks (Minecraft, robotics) will land in sibling folders.
Real-env result (closed-loop dm_control, n=50)
Catch rate, stochastic deployment: random 0.10 β BC 0.32 β imagination-RL 0.38. Imagination-RL β BC (paired sign test p = 0.63); the bottleneck is OOD state-coverage, not the policy head. Full analysis in the code repo.
Reproduce
pip install -r requirements.txt && pip install dm_control mujoco
export MUJOCO_GL=egl
python -m dynamics.evaluate_env \
--phase2-ckpt ball_in_cup/agent_bc.pt \
--phase3-ckpt ball_in_cup/agent_imagination_rl.pt \
--tokenizer-ckpt ball_in_cup/tokenizer.pt \
--task ball_in_cup_catch --action-dim 2 \
--num-episodes 50 --policies phase3,bc,random \
--device cuda --readout sample --output-dir eval-stoch
Provenance
Weights are derived from expert demonstrations in nicklashansen/dreamer4; the dataset itself is not redistributed (regenerate via convert_hansen_to_npz.py in the code repo). A faithful reproduction on a simple task with honest negative results β not a SOTA model.