Dreamer V4 (from-scratch) β€” checkpoints

Inference checkpoints for a from-scratch PyTorch reproduction of DreamerV4 (Hafner, Yan & Lillicrap, 2025; arXiv:2509.24527): tokenizer β†’ flow-matching world model β†’ behavior-cloned agent β†’ imagination RL (PMPO). Current checkpoints are trained end-to-end on ball_in_cup_catch (more tasks to follow).

Code: https://github.com/vijayabhaskar-ev/dreamer_v4

Checkpoints (optimizer state stripped β€” inference only)

file what it is size
ball_in_cup/tokenizer.pt masked-autoencoder tokenizer (128Γ—128) 300 MB
ball_in_cup/agent_bc.pt BC agent β€” world model + categorical policy + reward/continue heads 507 MB
ball_in_cup/agent_imagination_rl.pt imagination-RL policy + value heads (loads on top of agent_bc) 7 MB
ball_in_cup/world_model.pt world-model base, before agent finetuning β€” optional, only to retrain the agent 491 MB

Minimum set to reproduce the eval: tokenizer + agent_bc + agent_imagination_rl (~814 MB). Checkpoints for future tasks (Minecraft, robotics) will land in sibling folders.

Real-env result (closed-loop dm_control, n=50)

Catch rate, stochastic deployment: random 0.10 β†’ BC 0.32 β†’ imagination-RL 0.38. Imagination-RL β‰ˆ BC (paired sign test p = 0.63); the bottleneck is OOD state-coverage, not the policy head. Full analysis in the code repo.

Reproduce

pip install -r requirements.txt && pip install dm_control mujoco
export MUJOCO_GL=egl
python -m dynamics.evaluate_env \
  --phase2-ckpt    ball_in_cup/agent_bc.pt \
  --phase3-ckpt    ball_in_cup/agent_imagination_rl.pt \
  --tokenizer-ckpt ball_in_cup/tokenizer.pt \
  --task ball_in_cup_catch --action-dim 2 \
  --num-episodes 50 --policies phase3,bc,random \
  --device cuda --readout sample --output-dir eval-stoch

Provenance

Weights are derived from expert demonstrations in nicklashansen/dreamer4; the dataset itself is not redistributed (regenerate via convert_hansen_to_npz.py in the code repo). A faithful reproduction on a simple task with honest negative results β€” not a SOTA model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for vijayabhaskarev/dreamer-v4