Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.06657

Moral Foundations of Large Language Models

Paper • 2310.15337 • Published Oct 23, 2023 • 1
Specific versus General Principles for Constitutional AI

Paper • 2310.13798 • Published Oct 20, 2023 • 2
Contrastive Prefence Learning: Learning from Human Feedback without RL

Paper • 2310.13639 • Published Oct 20, 2023 • 24
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Paper • 2309.00267 • Published Sep 1, 2023 • 47

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 47
Towards Efficient and Exact Optimization of Language Model Alignment

Paper • 2402.00856 • Published Feb 1
A General Theoretical Paradigm to Understand Learning from Human Preferences

Paper • 2310.12036 • Published Oct 18, 2023 • 14
Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13

Alignment Algorithm Papers

Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 15
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 47
Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
SimPO: Simple Preference Optimization with a Reference-Free Reward

Paper • 2405.14734 • Published May 23 • 10

LLM Fundamental papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 241

Efficient RLHF: Reducing the Memory Usage of PPO

Paper • 2309.00754 • Published Sep 1, 2023 • 13
Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Aligning Large Multimodal Models with Factually Augmented RLHF

Paper • 2309.14525 • Published Sep 25, 2023 • 29
Stabilizing RLHF through Advantage Model and Selective Rehearsal

Paper • 2309.10202 • Published Sep 18, 2023 • 9

Language Modeling Is Compression

Paper • 2309.10668 • Published Sep 19, 2023 • 82
Baichuan 2: Open Large-scale Language Models

Paper • 2309.10305 • Published Sep 19, 2023 • 19
Chain-of-Verification Reduces Hallucination in Large Language Models

Paper • 2309.11495 • Published Sep 20, 2023 • 38
LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65

Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

Paper • 2309.10150 • Published Sep 18, 2023 • 24

Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 41
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 63
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52

Efficient RLHF: Reducing the Memory Usage of PPO

Paper • 2309.00754 • Published Sep 1, 2023 • 13
Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?

Paper • 2309.07462 • Published Sep 14, 2023 • 3
Stabilizing RLHF through Advantage Model and Selective Rehearsal

Paper • 2309.10202 • Published Sep 18, 2023 • 9

Statistical Rejection Sampling Improves Preference Optimization

Paper • 2309.06657 • Published Sep 13, 2023 • 13
Efficient Monotonic Multihead Attention

Paper • 2312.04515 • Published Dec 7, 2023 • 6
Layerwise Recurrent Router for Mixture-of-Experts

Paper • 2408.06793 • Published Aug 13 • 30
Scaling Up Diffusion and Flow-based XGBoost Models

Paper • 2408.16046 • Published Aug 28 • 8

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs