LongTraceRL
Collection
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards • 5 items • Updated • 1
LongTraceRL-30B is a 30-billion parameter (3B active) Mixture-of-Experts reasoning model trained with reinforcement learning on long-context multi-hop QA tasks using trajectory-based tiered distractors and entity-level rubric rewards.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("THU-KEG/LongTraceRL-30B")
tokenizer = AutoTokenizer.from_pretrained("THU-KEG/LongTraceRL-30B")
@misc{lin2026longtracerllearninglongcontextreasoning,
title={LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards},
author={Nianyi Lin and Jiajie Zhang and Lei Hou and Juanzi Li},
year={2026},
eprint={2605.31584},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.31584},
}