Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper • 2401.05566 • Published Jan 10 • 23
JudgeLM: Fine-tuned Large Language Models are Scalable Judges Paper • 2310.17631 • Published Oct 26, 2023 • 31
Instruction Tuning for Large Language Models: A Survey Paper • 2308.10792 • Published Aug 21, 2023 • 1
An Empirical Study of LLM-as-a-Judge for LLM Evaluation: Fine-tuned Judge Models are Task-specific Classifiers Paper • 2403.02839 • Published Mar 5
Holistic Safety and Responsibility Evaluations of Advanced AI Models Paper • 2404.14068 • Published 29 days ago