The ToolRL model trained for tool use through GRPO
Cheng Qian
chengq9
AI & ML interests
Agent, Tool Learning
Recent Activity
upvoted
a
paper
11 days ago
MiCRo: Mixture Modeling and Context-aware Routing for Personalized
Preference Learning
upvoted
a
paper
15 days ago
ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind
upvoted
a
paper
20 days ago
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
Organizations
Collections
1
models
3
datasets
0
None public yet