Submitted by taesiri 29 Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning Tencent 12 4
Submitted by Hanyuezhuohua 98 Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Tencent 223 3
Submitted by yolay 16 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models Tencent 26 2