Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex Paper • 2605.06139 • Published May 7 • 69
Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models Paper • 2602.01970 • Published Feb 2 • 5
Utility-Diversity Aware Online Batch Selection for LLM Supervised Fine-tuning Paper • 2510.16882 • Published Oct 19, 2025 • 5
FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts Paper • 2510.08396 • Published Oct 9, 2025 • 4
Fly-CL: A Fly-Inspired Framework for Enhancing Efficient Decorrelation and Reduced Training Time in Pre-trained Model-based Continual Representation Learning Paper • 2510.16877 • Published Oct 19, 2025 • 4