Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper β’ 2306.05685 β’ Published Jun 9, 2023 β’ 40
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent Paper β’ 2312.10003 β’ Published Dec 15, 2023 β’ 44
Leveraging Large Language Models for NLG Evaluation: A Survey Paper β’ 2401.07103 β’ Published Jan 13, 2024 β’ 4
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper β’ 2310.08491 β’ Published Oct 12, 2023 β’ 57