MindZero-hh-tom-Qwen3-4B-Instruct-2507

A MindZero checkpoint trained from Qwen/Qwen3-4B-Instruct-2507 with self-supervised reinforcement learning for online Theory-of-Mind reasoning in household environments.

Project Page Collection Code Paper

TL;DR

MindZero trains (M)LLMs to perform efficient and robust online mental reasoning without any mental-state annotations. During training, the model is rewarded for generating mental-state hypotheses that maximize the likelihood of observed actions, as estimated by a planner — analogous to model-based ToM reasoning. After training, MindZero internalizes this reasoning into fast single-pass inference.

Evaluation

Citation

@inproceedings{zhang2026mindzero,
  title     = {MindZero: Learning Online Mental Reasoning With Zero Annotations},
  author    = {Shunchi Zhang and Jin Lu and Chuanyang Jin and Yichao Zhou and Zhining Zhang and Tianmin Shu},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year      = {2026}
}
Downloads last month
28
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SCAI-JHU/MindZero-hh-tom-Qwen3-4B-Instruct-2507

Finetuned
(1707)
this model

Datasets used to train SCAI-JHU/MindZero-hh-tom-Qwen3-4B-Instruct-2507

Collection including SCAI-JHU/MindZero-hh-tom-Qwen3-4B-Instruct-2507