Instructions to use lerobot/fastwam_robotwin_uncond_3cam_384 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LeRobot
How to use lerobot/fastwam_robotwin_uncond_3cam_384 with LeRobot:
- Wan2.2
How to use lerobot/fastwam_robotwin_uncond_3cam_384 with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
FastWAM RoboTwin 3-Camera 384 (LeRobot)
FastWAM is a Vision-Language-Action policy built on Wan2.2 video-generation components and an action diffusion expert. It predicts continuous robot action chunks from visual observations, proprioception, and language/task context.
This checkpoint is converted to the Hugging Face / LeRobot fastwam policy format and is intended
for RoboTwin-style manipulation evaluation and fine-tuning.
Model description
- Policy type:
fastwam - Backbone family:
Wan-AI/Wan2.2-TI2V-5B - Inputs: concatenated multi-view RGB image, robot state/proprioception, task context
- Outputs: continuous robot actions
- Training objective: FastWAM video/action diffusion loss
- Action representation: continuous action chunks
- Intended use: evaluation or fine-tuning on RoboTwin-style manipulation tasks
- Image feature:
observation.images.image - Image shape:
(3, 384, 320) - State shape:
(14,) - Action shape:
(14,) - Action horizon:
32 - Number of video frames:
33 - Torch dtype:
bfloat16
Quick start
Installation
Install LeRobot from a version that includes the fastwam policy:
pip install "lerobot[fastwam]@git+https://github.com/huggingface/lerobot.git"
For full installation details, see the official LeRobot documentation: https://huggingface.co/docs/lerobot/installation
Load model and run select_action
import torch
from lerobot.policies.fastwam.modeling_fastwam import FastWAMPolicy
model_id = "<namespace>/fastwam-robotwin-uncond-3cam384"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
policy = FastWAMPolicy.from_pretrained(model_id, strict=False).to(device).eval()
batch = {
"observation.images.image": torch.zeros(1, 3, 384, 320, device=device),
"observation.state": torch.zeros(1, 14, device=device),
"prompt": "complete the manipulation task",
}
with torch.inference_mode():
action = policy.select_action(batch)
print(action.shape)
FastWAMPolicy.from_pretrained(...) loads the policy weights and the local Wan sidecar components
from this same repository snapshot. It should not download the Wan2.2 backbone separately.
Training step
For training or fine-tuning, call forward(...) and use the returned loss key:
policy.train()
outputs = policy.forward(batch)
loss = outputs["loss"]
loss.backward()
The training batch must contain FastWAM-ready tensors such as video, action, context, and
context_mask, or LeRobot observation/action keys that can be adapted by the policy wrapper.
Fine-tuning
A typical fine-tuning command follows the standard LeRobot training flow:
lerobot-train \
--dataset.repo_id=<your-robotwin-style-dataset> \
--output_dir=./outputs/fastwam_finetune \
--job_name=fastwam_finetune \
--policy.type=fastwam \
--policy.path=<namespace>/fastwam-robotwin-uncond-3cam384 \
--policy.device=cuda \
--steps=100000 \
--batch_size=1
Adjust batch size and sequence settings for available GPU memory.
Evaluate in simulation
For RoboTwin evaluation, use your RoboTwin evaluation setup and pass this repository id as the policy path:
python scripts/robotwin/eval_robotwin_fastwam.py \
--policy-path <namespace>/fastwam-robotwin-uncond-3cam384 \
--device cuda
Repository files
This repository is self-contained for FastWAMPolicy.from_pretrained(...):
config.json
model.safetensors
policy_preprocessor.json
policy_preprocessor_step_2_normalizer_processor.safetensors
policy_postprocessor.json
policy_postprocessor_step_0_unnormalizer_processor.safetensors
Wan2.2_VAE.pth
models_t5_umt5-xxl-enc-bf16.pth
google/umt5-xxl/
robotwin_uncond_3cam_384_dataset_stats.json
The Wan VAE, UMT5 text encoder, and tokenizer are stored beside the FastWAM policy weights.
Notes
This checkpoint uses only the migrated Hugging Face / LeRobot serialization format:
config.json, model.safetensors, and local Wan sidecar files. Original FastWAM .pt
checkpoint loading is not required.
- Downloads last month
- 36