-
JudgeLRM: Large Reasoning Models as a Judge
Paper • 2504.00050 • Published • 60 -
RM-R1: Reward Modeling as Reasoning
Paper • 2505.02387 • Published • 67 -
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Paper • 2505.01441 • Published • 35 -
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Paper • 2505.03318 • Published • 90
Roman256
Roman12322
·
AI & ML interests
None yet
Recent Activity
liked
a model
11 days ago
Qwen/Qwen3-32B
liked
a model
11 days ago
microsoft/Phi-4-reasoning
liked
a model
11 days ago
microsoft/phi-4
Organizations
None yet
Collections
1
models
0
None public yet
datasets
0
None public yet