Papers
arxiv:2606.19750

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

Published on Jun 18
· Submitted by
Darrien McKenzie
on Jun 23
Authors:
,

Abstract

Reinforcement learning approaches for improving LLM reasoning capabilities are enhanced by a Bayesian Manifold Curriculum framework that structures problem sampling based on task manifold relationships and endogenous non-stationarity.

Reinforcement learning (RL) is a central approach for improving reasoning capabilities in large language models (LLMs), where training efficiency depends critically on how problems are sampled during optimization. Existing adaptive curriculum learning methods typically prioritize prompts of intermediate difficulty, treating problem selection as a standard bandit problem with independent arms and overlooking the structured, heterogeneous nature of the task space. In this work, we frame problem sampling as a manifold-structured bandit problem with endogenous non-stationarity: problems are related through the model's latent representation space, and sampling decisions can steer how learning signals evolve across that space. To operationalize this perspective, we introduce Bayesian Manifold Curriculum (BMC), a structure-aware framework that organizes problems into a hierarchical task tree and applies Bayesian learning to guide sampling. Empirically, we find that different sampling strategies induce non-trivial tradeoffs between productivity (learning signal), diversity (coverage of the task manifold), and utility (evaluation relevance). These results show that prioritizing difficulty alone is insufficient for strong downstream performance, highlighting the importance of incorporating structure and type-awareness into problem sampling.

Community

Paper author Paper submitter
This comment has been hidden (marked as Resolved)
Paper author Paper submitter

Excited to share Manifold Bandits.

When using RL to train LLMs, training efficiency depends not only on the strength of the policy optimization algorithm, but also on which problems the model sees at each point in training. Many adaptive curriculum learning methods focus on difficulty: avoid problems that are already trivial or still impossible, and select problems that are in the model’s current “optimal learning zone.”

But difficulty is not the only thing that matters.

Problem type matters, too. And just as LLMs may not share our concept of difficulty, they may not share our concept of type, either.

Task decompositions are often manually defined according to human semantics, or ignored entirely by treating a whole dataset/domain as “one task.” In both cases, fine-grained structure can be lost: training data is heterogeneous, imbalanced, and composed of problem types whose learning dynamics can interact through policy updates.

Given this, we introduce Bayesian Manifold Curriculum (BMC): an adaptive curriculum learning method that caters not only to the policy’s ability, but also to its perception, or: its latent organization of tasks.

BMC builds Latent Task Trees from the policy model’s latent geometry, then uses this geometry to orchestrate training across diverse and interacting problem types. In doing so, we reveal learning dynamics and tradeoffs that are often hidden when curricula are evaluated only by difficulty or downstream evaluation accuracy.

In short: instead of treating curriculum learning as just finding problems of the right difficulty, we ask how training effort should be allocated across the policy’s latent task manifold.

Project page / visual walkthrough: https://darrienmckenzie.com/manifold-bandits/

Longer X thread coming soon: @darrienmckenzie

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.19750
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.19750 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.19750 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.19750 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.