-
Attention Is All You Need
Paper • 1706.03762 • Published • 36 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 25 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 33
Collections
Discover the best community collections!
Collections including paper arxiv:2305.18290
-
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 12 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38 -
Statistical Rejection Sampling Improves Preference Optimization
Paper • 2309.06657 • Published • 13 -
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper • 2405.14734 • Published • 8
-
Attention Is All You Need
Paper • 1706.03762 • Published • 36 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 10 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 235
-
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Paper • 2401.01967 • Published -
Secrets of RLHF in Large Language Models Part I: PPO
Paper • 2307.04964 • Published • 26 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 117 -
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Paper • 2404.05961 • Published • 62
-
A General Theoretical Paradigm to Understand Learning from Human Preferences
Paper • 2310.12036 • Published • 11 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper • 2403.07691 • Published • 59 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38