Formalizing Latent Thoughts: Four Axioms of Thought Representation in LLMs
Abstract
An axiomatic evaluation framework reveals systematic failures in latent thought representations of LLMs across multiple reasoning tasks, demonstrating that current representations fail to satisfy fundamental functional axioms consistently across different model architectures.
We introduce an axiomatic evaluation framework for latent thought representations in LLMs, comprising metrics that are independent of downstream benchmark scores and reveal representational failures that benchmark accuracy masks. Existing evaluations conflate representation quality with model capacity. Therefore, failures cannot be attributed to the representation rather than to the model that processes it. We formalize four functional axioms (Causality, Minimality, Separability, and Stability) and define a quantitative measure for each, computed directly on the representation independently of downstream accuracy. We audit open-weight LLMs across 23 reasoning tasks (e.g., Spatial Reasoning, Factual QA). We find that no candidate satisfies all four axioms simultaneously, that the representations distinguish task type reliably but cannot distinguish between two questions within the same task, and that the representations encode little information beyond what is already present in the input embedding. The failure is consistent across dense, reasoning-distilled, and RL-trained model families, indicating that the gap is structural rather than a property of model size or training procedure.
Community
We introduce an axiomatic evaluation framework for latent thought representations in LLMs, comprising metrics that are independent of downstream benchmark scores and reveal representational failures that benchmark accuracy masks. Existing evaluations conflate representation quality with model capacity. Therefore, failures cannot be attributed to the representation rather than to the model that processes it. We formalize four functional axioms (Causality, Minimality, Separability, and Stability) and define a quantitative measure for each, computed directly on the representation independently of downstream accuracy. We audit open-weight LLMs across 23 reasoning tasks (e.g., Spatial Reasoning, Factual QA). We find that no candidate satisfies all four axioms simultaneously, that the representations distinguish task type reliably but cannot distinguish between two questions within the same task, and that the representations encode little information beyond what is already present in the input embedding. The failure is consistent across dense, reasoning-distilled, and RL-trained model families, indicating that the gap is structural rather than a property of model size or training procedure.
It would work better if you used Asolaria ASI as A Nerual network. Uses 0 gpu
One axiom I'd add: necessity.
It checks that the model actually uses the latent state. Without it, a model can sometimes make $T$ look good while routing the real answer through residual prompt information or decoder priors.
To support this axiom, I would append three additional checks in a real training/optimization framework beyond the paper.
First, use latent ablation necessity. After the model produces $T$, corrupt it, swap it with another example’s $T$, zero it, or inject Gaussian noise. If answer quality barely changes, the model is not using the latent thought. A good latent reasoner should degrade gracefully under small noise but fail under semantic swaps.
Second, use counterfactual latent intervention. Take two minimally different inputs $x$ and $x'$ with different answers. Swap their latent thoughts. If the model follows the swapped latent state, $T$ is causally active. If it ignores the swap and answers from the prompt alone, the latent state is decorative.
Third, use multi-window causality, not only final-answer causality. Do not train $T$ only to predict the final answer. Split explicit reasoning traces into many windows and require latent states to substitute for intermediate reasoning prefixes. Otherwise, a latent vector that encodes only “answer = 42” could pass some final-output tests without representing the reasoning process.
$T$ can be skipped if it never helps the model in a predictable situation to save compute with early exiting.
Okay, turns out LaTeX is not supported in Paper comments. Embarassing, but I'm too lazy to edit right now because I'm on mobile.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Unlocking the Black Box of Latent Reasoning: An Interpretability-Guided Approach to Intervention (2026)
- Convergence Without Understanding: When Language Models Agree on Representations but Disagree on Reasoning (2026)
- Latent Thought Flow: Efficient Latent Reasoning in Large Language Models (2026)
- Invariant Features in Language Models: Geometric Characterization and Model Attribution (2026)
- Integrated and Cross-Architecture Interpretation of LLM Reasoning (2026)
- Beyond Language: Format-Agnostic Reasoning Subspaces in Large Language Models (2026)
- LoRi: Low-Rank Distillation for Implicit Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.27378 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper