-
Moral Foundations of Large Language Models
Paper • 2310.15337 • Published • 1 -
Specific versus General Principles for Constitutional AI
Paper • 2310.13798 • Published • 2 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 21 -
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Paper • 2309.00267 • Published • 45
Collections
Discover the best community collections!
Collections including paper arxiv:2402.00742
-
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 10 -
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper • 2310.01377 • Published • 4 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 79 -
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Paper • 2405.01535 • Published • 90
-
Fine-Tuning Language Models from Human Preferences
Paper • 1909.08593 • Published • 2 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 10 -
Leverage the Average: an Analysis of KL Regularization in RL
Paper • 2003.14089 • Published • 2 -
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Paper • 2404.01258 • Published • 10
-
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper • 2403.18421 • Published • 20 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 23 -
stanford-crfm/BioMedLM
Text Generation • Updated • 6.91k • 369 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 37
-
Lumiere: A Space-Time Diffusion Model for Video Generation
Paper • 2401.12945 • Published • 82 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 23 -
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Paper • 2403.18818 • Published • 22 -
TC4D: Trajectory-Conditioned Text-to-4D Generation
Paper • 2403.17920 • Published • 15
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 74 -
Efficient Exploration for LLMs
Paper • 2402.00396 • Published • 18 -
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 20 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 10
-
Efficient Tool Use with Chain-of-Abstraction Reasoning
Paper • 2401.17464 • Published • 15 -
Transforming and Combining Rewards for Aligning Large Language Models
Paper • 2402.00742 • Published • 10 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 61 -
Specialized Language Models with Cheap Inference from Limited Domain Data
Paper • 2402.01093 • Published • 45
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 15 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 9 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 10 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 45
-
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper • 2311.03285 • Published • 27 -
Tailoring Self-Rationalizers with Multi-Reward Distillation
Paper • 2311.02805 • Published • 2 -
Ultra-Long Sequence Distributed Transformer
Paper • 2311.02382 • Published • 2 -
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
Paper • 2309.11235 • Published • 15