Collections
Discover the best community collections!
Collections including paper arxiv:2401.06080
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 59 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 173 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 50 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 24
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 253 -
3D-LFM: Lifting Foundation Model
Paper • 2312.11894 • Published • 13 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 55 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 28
-
Pearl: A Production-ready Reinforcement Learning Agent
Paper • 2312.03814 • Published • 14 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 23 -
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper • 2310.13639 • Published • 21 -
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Paper • 2402.01391 • Published • 41
-
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 62 -
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Paper • 2303.12712 • Published • 2 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 23 -
Confidence-Building Measures for Artificial Intelligence: Workshop Proceedings
Paper • 2308.00862 • Published
-
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
Paper • 2210.14986 • Published • 4 -
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Paper • 2311.10702 • Published • 17 -
Large Language Models as Optimizers
Paper • 2309.03409 • Published • 72 -
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Paper • 2309.04269 • Published • 28
-
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 37 -
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Paper • 2306.01693 • Published • 2 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 135 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 23
-
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Paper • 2311.13231 • Published • 25 -
Nash Learning from Human Feedback
Paper • 2312.00886 • Published • 13 -
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper • 2401.06080 • Published • 23 -
MusicRL: Aligning Music Generation to Human Preferences
Paper • 2402.04229 • Published • 16
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 38 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 6 -
Alignment for Honesty
Paper • 2312.07000 • Published • 11 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 9