dk1-pretrain-40d-30hz-joints

Fast-WAM world-action model for the DK-1 bimanual robot, pretrained on a diverse blend of real teleop + synthetic data. It jointly predicts future video and a chunk of future robot actions via flow matching, sharing self-attention between the two modalities (MoT). Action output is a single joint-space relative head (multistep_rel), 14-D actions over a 50-step horizon.

The backbone is a half-size slice (15 of 30 layers) of Wan2.2-TI2V-5B (~3.3B trainable parameters), trained in bf16.

Checkpoints

path step weights
step_100000/model.pt 100k (final) raw model
step_100000/ema.pt 100k (final) EMA (recommended for inference)
step_80000/model.pt 80k raw model
step_80000/ema.pt 80k EMA
step_50000/model.pt 50k raw model
step_50000/ema.pt 50k EMA

All checkpoints are bf16. Each .pt is a dict with model_state_dict, step, config, norm_stats, and norm_config — self-contained for inference. Prefer the ema.pt weights.

Training

Training run (W&B): loss / eval plots

  • Parallelism: DDP (full model per GPU, gradients averaged), bf16 autocast, 8-bit AdamW, gradient checkpointing, torch.compile.
  • Inputs: 3 cameras (head, left/right wrist), 384×320, 5 observation + 8 future frames; proprio = pos/vel/torque (40-D), history length 3.
  • Rate / horizon: ~30 Hz, action horizon 50, RTC training enabled.
  • Data: DK-1 teleop (swan, cutlery-basket, duplo) + dk1-merge + RoboTwin synthetic (stack-blocks), 14-D pos-only sets zero-padded to 40-D.

Notes

Research checkpoint. Inherits the license and usage terms of the Wan2.2-TI2V-5B base model. Action/state normalization is robot-specific — use the bundled norm_stats.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for andreaskoepf/dk1-pretrain-40d-30hz-joints

Finetuned
(51)
this model