Instructions to use huiliu123/dreamworld-trained-models with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use huiliu123/dreamworld-trained-models with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("huiliu123/dreamworld-trained-models", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
DreamWorld โ DROID Action-Conditioned World Models (RGB / RGBD)
Fine-tuned checkpoints of the Wan-T2V-1.3B backbone for action-conditioned video prediction on the DROID dataset, trained with VAE-encoded latent shards.
What's in this repo
Six fine-tuning runs (final step model.safetensors + states.pt):
| Run | Modality | Conditioning | Final step | Notes |
|---|---|---|---|---|
arm_a_rgb_latent/ |
RGB | observed action (executed) | 85000 | base RGB run |
arm_a_rgb_latent_ctrlworld_obs/ |
RGB | observation state + executed action | 80000 | ctrlworld-style |
arm_a_rgb_latent_commanded_action/ |
RGB | commanded action (target) | 55000 | command-not-obs |
arm_b_rgbd_latent/ |
RGBD | observed action | 55000 | depth-conditioned |
arm_b_rgbd_latent_ctrlworld_obs/ |
RGBD | observation state + executed action | 80000 | RGBD + ctrlworld |
arm_b_rgbd_latent_commanded_action/ |
RGBD | commanded action | 55000 | RGBD + command |
Each file is the EMA / final transformer weights produced by finetune/trainer/sft_trainer/trainer.py.
Eval artifacts
eval_results/ contains sample inference outputs and precheck results referenced in the project notes (see companion repo).
Training code
Code, configs, and dataset preprocessing pipeline live in private repos owned by huiliu0424. The training scripts that produced these checkpoints are at script/training/training_arm_{a,b}_latent[_*].json.
Latent shards (training data)
The VAE-encoded latent shards used during training (~900 GB total) are not included in this repo. They were derived from the DROID dataset processed with depth (foundationstereo) and 2D point flow (cotracker) signals. Reach out if you need access.
Citation
If you use these models, please cite the upstream Wan-T2V work and the DROID dataset.
- Downloads last month
- -
Model tree for huiliu123/dreamworld-trained-models
Base model
Wan-AI/Wan2.1-T2V-1.3B