Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string

Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

InstinctWAM — Wan2.2-TI2V-5B V2V LoRA (SO101 chip-pickup future video prediction)

Action-free future video prediction world model for the SO101 chip-pickup task. This is a rank-32 LoRA on Wan-AI/Wan2.2-TI2V-5B, fine-tuned with V2V clean-context conditioning (the first K latent frames are held clean as the observed past; future frames are denoised; loss on future frames only).

This checkpoint was selected as the winner of a controlled bake-off (see the InstinctWAM repo docs/): it beat or tied every alternative — Motus robot-pretrained init, megamix co-training, context-noise augmentation, logit-normal timestep density, and 14B/DreamZero/Cosmos bases — across a 6-axis eval (PSNR/LPIPS, FVD, optical-flow warp, VLM physical/object plausibility, kinematic motion consistency).

Recipe

Base: Wan-AI/Wan2.2-TI2V-5B (diffusers transformer + VAE + UMT5).
LoRA: rank 32, alpha 32, on attn (to_q/k/v/out) + FFN; AdamW wd 0.01, lr 5e-5 cosine, batch 4, 6000 steps, bf16.
Data: GM717/chip_pickup_rightmost_single_top_wrist_v1 (target only; no megamix), 90 train / 10 held-out.
Conditioning: V2V clean-context, K in {1,3,5} latent context frames; action-free.
Inference: UniPC flow scheduler (shift=5), CFG ~1-5, predicts top & wrist views.

Usage

Load onto the base transformer with PEFT and run the V2V denoise loop (see scripts/eval_v2v.py / scripts/wm_dream_server.py in the InstinctWAM repo):

from diffusers import WanTransformer3DModel
from peft import PeftModel
m = WanTransformer3DModel.from_pretrained("Wan-AI/Wan2.2-TI2V-5B", subfolder="transformer")
m = PeftModel.from_pretrained(m, "GM717/InstinctWAM-Wan22-5B-chip-lora").merge_and_unload()

Known limitation: long-horizon (>~~5 s) autoregressive rollouts drift toward low-motion (~~38% of real motion by 10 s); addressing this (Self-Forcing / anchor frames) is future work.

Downloads last month: 17

Video Preview

Robotics

Model tree for GM717/InstinctWAM-Wan22-5B-chip-lora

Base model

Wan-AI/Wan2.2-TI2V-5B

Adapter

(13)

this model