MindZero-gw-asst-Qwen3-VL-8B-Instruct

A MindZero checkpoint trained from Qwen/Qwen3-VL-8B-Instruct with self-supervised reinforcement learning for online proactive assistance in GridWorld environments.

Project Page Collection Code Paper

TL;DR

MindZero trains (M)LLMs to perform efficient and robust online mental reasoning without any mental-state annotations. During training, the model is rewarded for generating mental-state hypotheses that maximize the likelihood of observed actions, as estimated by a planner — analogous to model-based ToM reasoning. After training, MindZero internalizes this reasoning into fast single-pass inference.

Evaluation

Citation

@inproceedings{zhang2026mindzero,
  title     = {MindZero: Learning Online Mental Reasoning With Zero Annotations},
  author    = {Shunchi Zhang and Jin Lu and Chuanyang Jin and Yichao Zhou and Zhining Zhang and Tianmin Shu},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year      = {2026}
}
Downloads last month
36
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SCAI-JHU/MindZero-gw-asst-Qwen3-VL-8B-Instruct

Finetuned
(280)
this model

Dataset used to train SCAI-JHU/MindZero-gw-asst-Qwen3-VL-8B-Instruct

Collection including SCAI-JHU/MindZero-gw-asst-Qwen3-VL-8B-Instruct