GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models Paper • 2406.14550 • Published Jun 20, 2024 • 4
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models Paper • 2411.07140 • Published Nov 11, 2024 • 35
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 103
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? Paper • 2502.19361 • Published Feb 26 • 28
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published 15 days ago • 44
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published 15 days ago • 44
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models Paper • 2411.07140 • Published Nov 11, 2024 • 35
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level Paper • 2406.11817 • Published Jun 17, 2024 • 13
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level Paper • 2406.11817 • Published Jun 17, 2024 • 13