Themis: Towards Flexible and Interpretable NLG Evaluation Paper • 2406.18365 • Published Jun 26, 2024
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation Paper • 2402.11493 • Published Feb 18, 2024
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Paper • 2408.07060 • Published Aug 13, 2024 • 42