τ_0-WM: A Unified Video-Action World Model for Robotic Manipulation
Paper • 2606.01027 • Published
Fine-tune of sii-research/tau-0-wm (τ0-WM, a
Wan2.2-TI2V-5B based video-action world model) on the UR3e dual-arm closebox task
(closebox_all3_lossless, 417 episodes, "close the box").
action_* modules) is trained. Objective: rectified flow / flow-matching MSE on
the 20-dim relative-EEF-6D action chunk (v = ε − x0, σ = t/1000), matching the deployment
pipeline.infer action path and the paper (arXiv:2606.01027, Eq. 2).utils.action_space_utils.abs_eef_to_rela.statistics_closebox_all3.json), openpi RunningStats
convention, over the relative-EEF-6D targets and 20-dim states.checkpoints/ckpt_XXXXXX.pt — {"step", "model"}; model is the FULL WanModel state dict
(bf16, ~11GB) i.e. base backbone + fine-tuned action stream, directly loadable for deployment.wan_pretrain_rela_eef6d.yaml — model/inference config.statistics_closebox_all3.json — action/state mean-std used for (de)normalization at inference.Load the state dict into the tau-0-wm WanModel (see the tau-0-wm repo) and deploy via TauPolicy,
pointing the statistics file at statistics_closebox_all3.json.
Base model
sii-research/tau-0-wm