RLWF — DreamZero checkpoints

Private checkpoint repository for the RLWF paper ("Active Robot Data Collection from World Model Feedback"). Two checkpoints, both stock DreamZero architecture, no architectural modifications — only the training data and training-config differ.

Layout

rlwf-ckpt/
├── README.md
├── LICENSE
├── mimicgen-core-14b-lora-step80000/   # LoRA fine-tune, ~217 MB
└── mimicgen-core-14b-full-step46000/   # full fine-tune, 10-shard ~47 GB

What each checkpoint is

`mimicgen-core-14b-lora-step80000/`

Architecture: stock DreamZero (groot.vla.model.dreamzero.base_vla.VLA)
Base model: Wan2.1-I2V-14B-480P, frozen
Adapter: LoRA, rank 4, target modules q,k,v,o,ffn.0,ffn.2
Action head: WAN flow-matching action transformer (groot.vla.model.dreamzero.action_head.wan_flow_matching_action_tf.WANPolicyHead)
Action dim: 32 (multi-embodiment), horizon 24
Training data: MimicGen expert demos on LIBERO MimicGen-core (12 tasks)
Step: 80,000

`mimicgen-core-14b-full-step46000/`

Architecture: same stock DreamZero, no changes
Variant: full fine-tune (no LoRA) on 16 GPUs with DeepSpeed ZeRO
Sharding: 10-shard safetensors (model-{1..10}-of-00010.safetensors)
Training data: same MimicGen-core 12 tasks, longer instruction prompts ("detailed_instruct" recipe)
Step: 46,000

How to load

With the DreamZero codebase available:

from stable_worldmodel.wm.utils import load_pretrained
# either subdir works the same way:
model = load_pretrained(
    "MinghaoFu/rlwf-ckpt/mimicgen-core-14b-lora-step80000",
    extra_args={"torch_dtype": "bfloat16"},
)

Direct safetensors load (LoRA, single file):

from safetensors.torch import load_file
state_dict = load_file("model.safetensors")

Direct safetensors load (full, sharded):

import json
from safetensors.torch import load_file

with open("model.safetensors.index.json") as f:
    index = json.load(f)
state_dict = {}
for shard in sorted(set(index["weight_map"].values())):
    state_dict.update(load_file(shard))

Full training config is in experiment_cfg/conf.yaml of each subdir.

What is NOT in this repo

DeepSpeed optimizer state (global_step*/) — stripped to keep the download small. If you want to resume training instead of just loading for inference, ping me; the optimizer shards are kept separately.
rng_state_*.pth — same reason.
The latest text file — points to a path inside global_step*/, irrelevant without the optimizer state.

License

MIT (see LICENSE). The underlying Wan2.1-I2V-14B-480P base model has its own Apache-2.0 license. DreamZero architecture follows the original authors' release terms; this repo only redistributes the fine-tuned weights.

Contact

Minghao Fu — isminghaofu@gmail.com

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support