-
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 27 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-ppo
Updated • 1.16k -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-grpo
Updated • 3 -
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-ppo
Updated • 27
Bowen
PeterJinGo
AI & ML interests
None yet
Recent Activity
updated
a model
7 days ago
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-ppo
updated
a collection
15 days ago
Search-R1
Organizations
Collections
1
models
11
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-em-ppo
Updated
•
184
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-ppo
Updated
•
27
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-grpo
Updated
•
7
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-ppo
Updated
•
77
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-7b-it-em-ppo
Updated
•
1.11k
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-it-em-grpo
Updated
•
4
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-it-em-ppo
Updated
•
75
PeterJinGo/SearchR1-nq_hotpotqa_train-qwen2.5-3b-em-grpo
Updated
•
122
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-ppo
Updated
•
1.16k
PeterJinGo/SearchR1-nq_hotpotqa_train-llama3.2-3b-em-grpo
Updated
•
3
datasets
11
PeterJinGo/nq_hotpotqa_train
Viewer
•
Updated
•
221k
•
266
PeterJinGo/wiki-18-e5-index
Updated
•
1.76k
PeterJinGo/wiki-18-corpus
Updated
•
824
PeterJinGo/ultrafeedback_first_5000
Viewer
•
Updated
•
5k
•
8
PeterJinGo/gsm8k-chat
Viewer
•
Updated
•
7.47k
•
42
PeterJinGo/math-zeroshot-chat
Viewer
•
Updated
•
7.5k
•
46
PeterJinGo/math-zeroshot
Viewer
•
Updated
•
7.5k
•
43
PeterJinGo/math2
Viewer
•
Updated
•
7.5k
•
37
PeterJinGo/math
Viewer
•
Updated
•
7.5k
•
51
PeterJinGo/gsm8k
Viewer
•
Updated
•
7.47k
•
47