Mem-0 Execution Module — `m1_mix` (RMBench / RoboTwin 2.0)

A single Mem-0 low-level execution-module checkpoint trained jointly on all five RMBench M1 tasks (the m1_mix dataset) and evaluated on each task in turn. M1 tasks require only the execution module — no high-level planner / vLLM is needed for inference.

Backbone: Qwen3-VL-2B-Instruct (vision-language) — weights fine-tuned and bundled in the checkpoint
Action head: DiT-B flow-matching policy (action chunk of 30, 16-D action)
Memory: MemoryBank (instant + anchor memory fusion across the episode)
Aux head: subtask-end classifier (used for Mn multi-stage tasks; inert for M1)
Total parameters: ≈ 2.67 B

Results

task_config = demo_clean, instruction_type = unseen, 100 episodes per task, action_horizon = 30. The same checkpoint and same m1_mix normalization stats are used for every task.

Task	Success Rate	Reward
put_back_block	1.00	1.00
rearrange_blocks	0.86	0.86
swap_blocks	0.81	0.81
swap_T	0.13	0.13
observe_and_pickup	0.03	0.00
Average	0.566	—

Per-episode logs and rollout videos for all five tasks are under eval_results/. See eval_results/summary.md for details and task_instructions.json for the exact per-task language instruction used.

Contents of this bundle

m1_mix_submit/
├── README.md                                   # this file
├── task_instructions.json                      # verbatim --global_task per task + scores
├── checkpoint/
│   ├── m1_mix_final_step50000.pt.part00 … part08   # 15.3 GB full training ckpt, split into 9 parts (2×4 GB + 7×≤1 GB)
│   ├── m1_mix_final_step50000.pt.sha256        # SHA-256 of the reassembled checkpoint
│   └── README_REASSEMBLE.md                    # how to cat the parts back together + verify
├── norm_stats/
│   └── norm_stats.json                         # min-max state/action stats → [-1, 1]
├── configs/
│   ├── execution_module_train_m1_mix.yaml      # training config (reproducibility)
│   └── deploy_policy.yml                        # inference / deployment config
├── qwen_base_config/                            # Qwen3-VL-2B-Instruct config/processor ONLY
│   ├── config.json, generation_config.json
│   ├── tokenizer*.json, vocab.json, merges.txt
│   ├── preprocessor_config.json, video_preprocessor_config.json, chat_template.json
│   └── README_Qwen3-VL-2B-Instruct.md          # upstream model card (Apache-2.0)
└── eval_results/
    ├── summary.md
    └── <task>/                                  # _result.txt, eval_log.txt, episode*.mp4 (×100)

About the checkpoint

Reassemble first. The 15.3 GB checkpoint is uploaded as 9 byte-split parts (m1_mix_final_step50000.pt.part00…08) because the upload path capped single files and throttled per-window bytes. Concatenation reproduces the original bit-for-bit:
cat m1_mix_final_step50000.pt.part?? > m1_mix_final_step50000.pt
sha256sum -c m1_mix_final_step50000.pt.sha256   # -> m1_mix_final_step50000.pt: OK
See checkpoint/README_REASSEMBLE.md for details.

Once reassembled, m1_mix_final_step50000.pt is the full training checkpoint at step 50000:

key	content
`model_state_dict`	910 tensors, ≈ 2.67 B params (`qwen_model` ≈ 2.44 B, `action_model` ≈ 160 M, `memory_bank` ≈ 39 M, `classifier` ≈ 32 M); bf16 + fp32
`optimizer_state_dict`	AdamW moments — for resume/fine-tune only
`scheduler_state_dict`	cosine LR scheduler state
`global_step`	50000

The model_state_dict is self-contained: it already includes the fine-tuned Qwen3-VL-2B backbone weights. The bundled qwen_base_config/ provides only the architecture/tokenizer/processor config — the base model weights (model.safetensors, ~4 GB) are not re-distributed here; download them from the official repo (see below).

Inference-only slimming (15.3 GB → ≈ 6 GB) if you don't need to resume training:

import torch
ck = torch.load("checkpoint/m1_mix_final_step50000.pt", map_location="cpu", weights_only=False)
torch.save({"model_state_dict": ck["model_state_dict"], "global_step": ck["global_step"]},
           "m1_mix_inference.pt")

The deploy loader reads payload["model_state_dict"] and calls load_state_dict(..., strict=False), so either the full or the slimmed file works unchanged.

Dependencies

Code: the RMBench / Mem-0 repository (this checkpoint targets its policy/Mem-0 execution module and script/eval_policy.py). Follow the repo README for the RoboTwin 2.0 simulator environment setup.
Base VLM: Qwen/Qwen3-VL-2B-Instruct (Apache-2.0). Required at model instantiation for the architecture + image/text processor. Its weights are overwritten by this checkpoint at load time (strict=False), but the directory must exist and contain model.safetensors:
```
huggingface-cli download Qwen/Qwen3-VL-2B-Instruct \
    --local-dir policy/Mem-0/checkpoints/Qwen3-VL-2B-Instruct
```
The small config/processor files in qwen_base_config/ are exactly the ones used for training and evaluation; you may overlay them onto the downloaded directory if the upstream revision differs.

How to run evaluation

Point the deploy config at the checkpoint and the m1_mix stats, then run one task at a time. This mirrors exactly how the numbers above were produced:

python script/eval_policy.py --config policy/Mem-0/deploy_policy.yml --overrides \
    --task_name        swap_blocks \
    --execution_ckpt   /path/to/m1_mix_final_step50000.pt \
    --state_stats_path /path/to/norm_stats/norm_stats.json \
    --ckpt_setting     m1mix \
    --global_task      "There are three traies on the table, and two blocks are placed in two different traies. You may move only one block at a time, and each tray can hold at most one block. Swap the positions of the two blocks. Finally press the button." \
    --action_horizon   30

Replace --task_name and --global_task with each of the five tasks (strings in task_instructions.json). The checkpoint and --state_stats_path stay the same.
--ckpt_setting m1mix only labels the output directory (eval_result/<task>/Mem-0/demo_clean/m1mix/<timestamp>/).
--vllm_url is accepted but unused for M1 tasks (the global instruction is set directly; the planner client is constructed but never queried).
Ensure execution_module.qwen_vl.model_path in deploy_policy.yml points to your downloaded Qwen3-VL-2B-Instruct directory.

Model architecture (from `configs/`)

VLM backbone — Qwen3-VL-2B-Instruct, 224×224 head-camera image + language instruction, last-layer hidden states (hidden size 2048).
MemoryBank — window_size 30, initial_anchor_size 1, num_heads 8, memory_accumulation 8, dropout 0.1; fuses an instant-memory and an anchor-memory token; concatenated with the text feature → a 3-token summary (B, 3, 2048).
DiT-B action head (FlowmatchingActionHead) — num_layers 16, cross_attention_dim 2048, action_dim 16, state_dim 16, action_horizon 30, num_inference_timesteps 8; flow-matching regression of a 30-step action chunk.
Subtask-end classifier — MLP hidden_sizes [6144, 2048, 512], pos_weight 10.0, focal_gamma 1.0, threshold 0.5. Drives stage transitions in Mn tasks; for M1 the episode is a single stage so it does not affect rollout.

Training (from `configs/execution_module_train_m1_mix.yaml`)

Data: m1_mix (the five M1 tasks merged into one LeRobot dataset with globally unique episode_ids). Features: head-camera image, state, action, subtask, subtask_end, episode_id.
Schedule: train_steps 50000, batch_size 56, cosine scheduler, warmup_ratio 0.05, grad_clip_norm 2.5, weight_decay 0.005, seed 42.
Learning rates: base 1e-5, qwen_model 1e-5, action_model 1e-4, classifier 1e-4 (min LRs 1e-6 / 1e-6 / 5e-6 / 5e-6).
Loss: lambda_action 1.0, lambda_classifier 0.2.

Normalization

State and action are min-max normalized to [-1, 1] over the 14 arm dimensions using norm_stats/norm_stats.json (NORM_WAY = "minmax" in deploy_policy.py). Use the same stats file at inference; predicted actions are denormalized with it before being sent to the environment. Action chunks from overlapping predictions are averaged (mean smoothing) before execution.

Limitations

swap_T (0.13) and observe_and_pickup (0.03) are weak: the former needs precise T-block position and orientation alignment; the latter needs cross-view target re-identification after a visual occlusion followed by a pickup. The joint m1_mix model does not solve these reliably.
Numbers are on RoboTwin 2.0 demo_clean with unseen instruction phrasings; other task configs / domain randomization will differ.

License & attribution

Base VLM Qwen3-VL-2B-Instruct is © the Qwen team, licensed Apache-2.0 (see qwen_base_config/README_Qwen3-VL-2B-Instruct.md). Because the checkpoint embeds fine-tuned Qwen weights, that license applies to the corresponding components.
RMBench / RoboTwin and the Mem-0 policy code are governed by their respective upstream licenses; refer to the source repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Mem-0 Execution Module — m1_mix (RMBench / RoboTwin 2.0)