FastWAM RoboTwin 3-Camera 384 (LeRobot)

FastWAM is a Vision-Language-Action policy built on Wan2.2 video-generation components and an action diffusion expert. It predicts continuous robot action chunks from visual observations, proprioception, and language/task context.

This checkpoint is converted to the Hugging Face / LeRobot fastwam policy format and is intended for RoboTwin-style manipulation evaluation and fine-tuning.

Model description

  • Policy type: fastwam
  • Backbone family: Wan-AI/Wan2.2-TI2V-5B
  • Inputs: concatenated multi-view RGB image, robot state/proprioception, task context
  • Outputs: continuous robot actions
  • Training objective: FastWAM video/action diffusion loss
  • Action representation: continuous action chunks
  • Intended use: evaluation or fine-tuning on RoboTwin-style manipulation tasks
  • Image feature: observation.images.image
  • Image shape: (3, 384, 320)
  • State shape: (14,)
  • Action shape: (14,)
  • Action horizon: 32
  • Number of video frames: 33
  • Torch dtype: bfloat16

Quick start

Installation

Install LeRobot from a version that includes the fastwam policy:

pip install "lerobot[fastwam]@git+https://github.com/huggingface/lerobot.git"

For full installation details, see the official LeRobot documentation: https://huggingface.co/docs/lerobot/installation

Load model and run select_action

import torch

from lerobot.policies.fastwam.modeling_fastwam import FastWAMPolicy

model_id = "<namespace>/fastwam-robotwin-uncond-3cam384"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

policy = FastWAMPolicy.from_pretrained(model_id, strict=False).to(device).eval()

batch = {
    "observation.images.image": torch.zeros(1, 3, 384, 320, device=device),
    "observation.state": torch.zeros(1, 14, device=device),
    "prompt": "complete the manipulation task",
}

with torch.inference_mode():
    action = policy.select_action(batch)

print(action.shape)

FastWAMPolicy.from_pretrained(...) loads the policy weights and the local Wan sidecar components from this same repository snapshot. It should not download the Wan2.2 backbone separately.

Training step

For training or fine-tuning, call forward(...) and use the returned loss key:

policy.train()

outputs = policy.forward(batch)
loss = outputs["loss"]
loss.backward()

The training batch must contain FastWAM-ready tensors such as video, action, context, and context_mask, or LeRobot observation/action keys that can be adapted by the policy wrapper.

Fine-tuning

A typical fine-tuning command follows the standard LeRobot training flow:

lerobot-train \
  --dataset.repo_id=<your-robotwin-style-dataset> \
  --output_dir=./outputs/fastwam_finetune \
  --job_name=fastwam_finetune \
  --policy.type=fastwam \
  --policy.path=<namespace>/fastwam-robotwin-uncond-3cam384 \
  --policy.device=cuda \
  --steps=100000 \
  --batch_size=1

Adjust batch size and sequence settings for available GPU memory.

Evaluate in simulation

For RoboTwin evaluation, use your RoboTwin evaluation setup and pass this repository id as the policy path:

python scripts/robotwin/eval_robotwin_fastwam.py \
  --policy-path <namespace>/fastwam-robotwin-uncond-3cam384 \
  --device cuda

Repository files

This repository is self-contained for FastWAMPolicy.from_pretrained(...):

config.json
model.safetensors
policy_preprocessor.json
policy_preprocessor_step_2_normalizer_processor.safetensors
policy_postprocessor.json
policy_postprocessor_step_0_unnormalizer_processor.safetensors
Wan2.2_VAE.pth
models_t5_umt5-xxl-enc-bf16.pth
google/umt5-xxl/
robotwin_uncond_3cam_384_dataset_stats.json

The Wan VAE, UMT5 text encoder, and tokenizer are stored beside the FastWAM policy weights.

Notes

This checkpoint uses only the migrated Hugging Face / LeRobot serialization format: config.json, model.safetensors, and local Wan sidecar files. Original FastWAM .pt checkpoint loading is not required.

Downloads last month
36
Safetensors
Model size
6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including lerobot/fastwam_robotwin_uncond_3cam_384