MindZero-hh-tom-Llama-3.1-8B-Instruct

A MindZero checkpoint trained from meta-llama/Llama-3.1-8B-Instruct with self-supervised reinforcement learning for online Theory-of-Mind reasoning in household environments.

Project Page Collection Code Paper

TL;DR

MindZero trains (M)LLMs to perform efficient and robust online mental reasoning without any mental-state annotations. During training, the model is rewarded for generating mental-state hypotheses that maximize the likelihood of observed actions, as estimated by a planner — analogous to model-based ToM reasoning. After training, MindZero internalizes this reasoning into fast single-pass inference.

Evaluation

Citation

@inproceedings{zhang2026mindzero,
  title     = {MindZero: Learning Online Mental Reasoning With Zero Annotations},
  author    = {Shunchi Zhang and Jin Lu and Chuanyang Jin and Yichao Zhou and Zhining Zhang and Tianmin Shu},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year      = {2026}
}
Downloads last month
29
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SCAI-JHU/MindZero-hh-tom-Llama-3.1-8B-Instruct

Finetuned
(2763)
this model

Datasets used to train SCAI-JHU/MindZero-hh-tom-Llama-3.1-8B-Instruct

Collection including SCAI-JHU/MindZero-hh-tom-Llama-3.1-8B-Instruct