Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models Paper • 2501.01830 • Published Jan 3 • 18
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 21
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 21
Towards Scalable Automated Alignment of LLMs: A Survey Paper • 2406.01252 • Published Jun 3, 2024 • 2
Towards Scalable Automated Alignment of LLMs: A Survey Paper • 2406.01252 • Published Jun 3, 2024 • 2