YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
RLWF β DreamZero checkpoints
Private checkpoint repository for the RLWF paper ("Active Robot Data Collection from World Model Feedback"). Two checkpoints, both stock DreamZero architecture, no architectural modifications β only the training data and training-config differ.
Layout
rlwf-ckpt/
βββ README.md
βββ LICENSE
βββ mimicgen-core-14b-lora-step80000/ # LoRA fine-tune, ~217 MB
βββ mimicgen-core-14b-full-step46000/ # full fine-tune, 10-shard ~47 GB
What each checkpoint is
mimicgen-core-14b-lora-step80000/
- Architecture: stock DreamZero (
groot.vla.model.dreamzero.base_vla.VLA) - Base model: Wan2.1-I2V-14B-480P, frozen
- Adapter: LoRA, rank 4, target modules
q,k,v,o,ffn.0,ffn.2 - Action head: WAN flow-matching action transformer
(
groot.vla.model.dreamzero.action_head.wan_flow_matching_action_tf.WANPolicyHead) - Action dim: 32 (multi-embodiment), horizon 24
- Training data: MimicGen expert demos on LIBERO MimicGen-core (12 tasks)
- Step: 80,000
mimicgen-core-14b-full-step46000/
- Architecture: same stock DreamZero, no changes
- Variant: full fine-tune (no LoRA) on 16 GPUs with DeepSpeed ZeRO
- Sharding: 10-shard safetensors (
model-{1..10}-of-00010.safetensors) - Training data: same MimicGen-core 12 tasks, longer instruction prompts ("detailed_instruct" recipe)
- Step: 46,000
How to load
With the DreamZero codebase available:
from stable_worldmodel.wm.utils import load_pretrained
# either subdir works the same way:
model = load_pretrained(
"MinghaoFu/rlwf-ckpt/mimicgen-core-14b-lora-step80000",
extra_args={"torch_dtype": "bfloat16"},
)
Direct safetensors load (LoRA, single file):
from safetensors.torch import load_file
state_dict = load_file("model.safetensors")
Direct safetensors load (full, sharded):
import json
from safetensors.torch import load_file
with open("model.safetensors.index.json") as f:
index = json.load(f)
state_dict = {}
for shard in sorted(set(index["weight_map"].values())):
state_dict.update(load_file(shard))
Full training config is in experiment_cfg/conf.yaml of each subdir.
What is NOT in this repo
- DeepSpeed optimizer state (
global_step*/) β stripped to keep the download small. If you want to resume training instead of just loading for inference, ping me; the optimizer shards are kept separately. rng_state_*.pthβ same reason.- The
latesttext file β points to a path insideglobal_step*/, irrelevant without the optimizer state.
License
MIT (see LICENSE). The underlying Wan2.1-I2V-14B-480P base model has its own
Apache-2.0 license. DreamZero architecture follows the original
authors' release terms; this repo only redistributes the fine-tuned weights.
Contact
Minghao Fu β isminghaofu@gmail.com