Instructions to use GM717/InstinctWAM-Wan22-5B-chip-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use GM717/InstinctWAM-Wan22-5B-chip-lora with PEFT:
Task type is invalid.
- Wan2.2
How to use GM717/InstinctWAM-Wan22-5B-chip-lora with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Diffusers
How to use GM717/InstinctWAM-Wan22-5B-chip-lora with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Wan-AI/Wan2.2-TI2V-5B", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("GM717/InstinctWAM-Wan22-5B-chip-lora") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Configuration Parsing Warning:In adapter_config.json: "peft.base_model_name_or_path" must be a string
Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string
InstinctWAM โ Wan2.2-TI2V-5B V2V LoRA (SO101 chip-pickup future video prediction)
Action-free future video prediction world model for the SO101 chip-pickup task. This is a rank-32 LoRA on Wan-AI/Wan2.2-TI2V-5B, fine-tuned with V2V clean-context conditioning (the first K latent frames are held clean as the observed past; future frames are denoised; loss on future frames only).
This checkpoint was selected as the winner of a controlled bake-off (see the InstinctWAM repo docs/): it beat or
tied every alternative โ Motus robot-pretrained init, megamix co-training, context-noise augmentation, logit-normal
timestep density, and 14B/DreamZero/Cosmos bases โ across a 6-axis eval (PSNR/LPIPS, FVD, optical-flow warp,
VLM physical/object plausibility, kinematic motion consistency).
Recipe
- Base:
Wan-AI/Wan2.2-TI2V-5B(diffusers transformer + VAE + UMT5). - LoRA: rank 32, alpha 32, on attn (to_q/k/v/out) + FFN; AdamW wd 0.01, lr 5e-5 cosine, batch 4, 6000 steps, bf16.
- Data:
GM717/chip_pickup_rightmost_single_top_wrist_v1(target only; no megamix), 90 train / 10 held-out. - Conditioning: V2V clean-context, K in {1,3,5} latent context frames; action-free.
- Inference: UniPC flow scheduler (shift=5), CFG ~1-5, predicts top & wrist views.
Usage
Load onto the base transformer with PEFT and run the V2V denoise loop (see scripts/eval_v2v.py /
scripts/wm_dream_server.py in the InstinctWAM repo):
from diffusers import WanTransformer3DModel
from peft import PeftModel
m = WanTransformer3DModel.from_pretrained("Wan-AI/Wan2.2-TI2V-5B", subfolder="transformer")
m = PeftModel.from_pretrained(m, "GM717/InstinctWAM-Wan22-5B-chip-lora").merge_and_unload()
Known limitation: long-horizon (>5 s) autoregressive rollouts drift toward low-motion (38% of real motion by 10 s);
addressing this (Self-Forcing / anchor frames) is future work.
- Downloads last month
- 17
Model tree for GM717/InstinctWAM-Wan22-5B-chip-lora
Base model
Wan-AI/Wan2.2-TI2V-5B