Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 3 days ago • 27
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published 3 days ago • 52
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 5 days ago • 91
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 12 days ago • 338
Light-R1 Collection Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond • 7 items • Updated 10 days ago • 11
Hallucination detection Collection Trained ModernBERT (base and large) for detection hallucinations in LLM responses. The models are trained as token classifications. • 4 items • Updated 18 days ago • 15
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published 26 days ago • 26
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Paper • 2502.16894 • Published 27 days ago • 28
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 26 days ago • 70
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published Feb 10 • 126
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Paper • 2502.15027 • Published about 1 month ago • 7
Sky-T1-7B Collection A series of 7B models trained with different recipes and the corresponding training data. • 8 items • Updated Feb 14 • 6
Process Reward Models Collection Model and Datasets for Qwen 2.5 Math PRM 7B • 6 items • Updated Feb 18 • 2
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published Feb 14 • 32
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning Paper • 2502.04689 • Published Feb 7 • 7