-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 138 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 27 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2402.04494
-
Grandmaster-Level Chess Without Search
Paper • 2402.04494 • Published • 65 -
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
Paper • 2402.04248 • Published • 25 -
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 22 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 58