A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity Paper • 2401.01967 • Published Jan 3
Secrets of RLHF in Large Language Models Part I: PPO Paper • 2307.04964 • Published Jul 11, 2023 • 26
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders Paper • 2404.05961 • Published Apr 9 • 62
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Paper • 2305.18290 • Published May 29, 2023 • 37
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models Paper • 2404.02948 • Published Apr 3 • 2