-
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 59 -
A Survey on Data Selection for Language Models
Paper • 2402.16827 • Published • 3 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 55 -
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
Paper • 2306.01116 • Published • 29
Collections
Discover the best community collections!
Collections including paper arxiv:2305.18290
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38 -
HyperCLOVA X Technical Report
Paper • 2404.01954 • Published • 19 -
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Paper • 2404.09956 • Published • 11 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 80
-
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text
Paper • 2403.18421 • Published • 21 -
Long-form factuality in large language models
Paper • 2403.18802 • Published • 23 -
stanford-crfm/BioMedLM
Text Generation • Updated • 2.48k • 377 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 32 -
Dueling RL: Reinforcement Learning with Trajectory Preferences
Paper • 2111.04850 • Published • 2
-
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Paper • 2403.10704 • Published • 56 -
WARM: On the Benefits of Weight Averaged Reward Models
Paper • 2401.12187 • Published • 17 -
RewardBench: Evaluating Reward Models for Language Modeling
Paper • 2403.13787 • Published • 19 -
DreamReward: Text-to-3D Generation with Human Preference
Paper • 2403.14613 • Published • 33
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 15 -
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 41 -
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper • 2309.14717 • Published • 43
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 2 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 12
-
Large Language Model Alignment: A Survey
Paper • 2309.15025 • Published • 2 -
Aligning Large Language Models with Human: A Survey
Paper • 2307.12966 • Published • 1 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38 -
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Paper • 2310.05344 • Published • 1
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 75 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 26
-
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 25 -
Attention Is All You Need
Paper • 1706.03762 • Published • 37 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 38 -
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 33