muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol10_rerun_worker4 Viewer • Updated 24 days ago • 15.9k • 79
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol10_rerun_worker4 Viewer • Updated 24 days ago • 15.9k • 79
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol50_rerun_worker4 Viewer • Updated 25 days ago • 926 • 77
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol50_rerun_worker4 Viewer • Updated 25 days ago • 926 • 77
snap-stanford/hotpotqa_four_agents_pipeline-preference_scorer Viewer • Updated 29 days ago • 885 • 212
shirwu/official-hotpotqa-hotpotqa_four_agents_pipeline-hint_generator-iter0 Updated about 1 month ago
shirwu/official-hotpotqa-hotpotqa_four_agents_pipeline-hint_generator-iter0 Updated about 1 month ago
shirwu/official-hotpotqa-hotpotqa_four_agents_pipeline-answer_generator-iter0 Updated about 1 month ago
shirwu/official-hotpotqa-hotpotqa_four_agents_pipeline-answer_generator-iter0 Updated about 1 month ago
Discover and Cure: Concept-aware Mitigation of Spurious Correlation Paper • 2305.00650 • Published May 1, 2023
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases Paper • 2404.13207 • Published Apr 19, 2024
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning Paper • 2406.11200 • Published Jun 17, 2024