Dreamer V4 (from-scratch) — checkpoints

Inference checkpoints for a from-scratch PyTorch reproduction of DreamerV4 (Hafner, Yan & Lillicrap, 2025; arXiv:2509.24527): tokenizer → flow-matching world model → behavior-cloned agent → imagination RL (PMPO). Current checkpoints are trained end-to-end on ball_in_cup_catch (more tasks to follow).

Code: https://github.com/vijayabhaskar-ev/dreamer_v4

Checkpoints (optimizer state stripped — inference only)

file	what it is	size
`ball_in_cup/tokenizer.pt`	masked-autoencoder tokenizer (128×128)	300 MB
`ball_in_cup/agent_bc.pt`	BC agent — world model + categorical policy + reward/continue heads	507 MB
`ball_in_cup/agent_imagination_rl.pt`	imagination-RL policy + value heads (loads on top of `agent_bc`)	7 MB
`ball_in_cup/world_model.pt`	world-model base, before agent finetuning — optional, only to retrain the agent	491 MB

Minimum set to reproduce the eval: tokenizer + agent_bc + agent_imagination_rl (~814 MB). Checkpoints for future tasks (Minecraft, robotics) will land in sibling folders.

Real-env result (closed-loop dm_control, n=50)

Catch rate, stochastic deployment: random 0.10 → BC 0.32 → imagination-RL 0.38. Imagination-RL ≈ BC (paired sign test p = 0.63); the bottleneck is OOD state-coverage, not the policy head. Full analysis in the code repo.

Reproduce

pip install -r requirements.txt && pip install dm_control mujoco
export MUJOCO_GL=egl
python -m dynamics.evaluate_env \
  --phase2-ckpt    ball_in_cup/agent_bc.pt \
  --phase3-ckpt    ball_in_cup/agent_imagination_rl.pt \
  --tokenizer-ckpt ball_in_cup/tokenizer.pt \
  --task ball_in_cup_catch --action-dim 2 \
  --num-episodes 50 --policies phase3,bc,random \
  --device cuda --readout sample --output-dir eval-stoch

Provenance

Weights are derived from expert demonstrations in nicklashansen/dreamer4; the dataset itself is not redistributed (regenerate via convert_hansen_to_npz.py in the code repo). A faithful reproduction on a simple task with honest negative results — not a SOTA model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Paper for vijayabhaskarev/dreamer-v4

Training Agents Inside of Scalable World Models

Paper • 2509.24527 • Published Sep 29, 2025 • 8