-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 39 -
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
Paper • 2311.10642 • Published • 23 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 70
Collections
Discover the best community collections!
Collections including paper arxiv:2311.11829
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 39 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 6 -
Alignment for Honesty
Paper • 2312.07000 • Published • 11 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 11
-
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 34 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 9 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 70 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 39
-
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
Paper • 2311.08692 • Published • 12 -
DiLoCo: Distributed Low-Communication Training of Language Models
Paper • 2311.08105 • Published • 14 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 39 -
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Paper • 2312.06134 • Published • 2
-
EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
Paper • 2311.02077 • Published • 14 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 39 -
Large Language Models for Mathematicians
Paper • 2312.04556 • Published • 11 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44
-
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation
Paper • 2310.18628 • Published • 7 -
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
Paper • 2310.19019 • Published • 9 -
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Paper • 2311.02262 • Published • 10 -
Thread of Thought Unraveling Chaotic Contexts
Paper • 2311.08734 • Published • 6
-
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 25 -
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models
Paper • 2308.16137 • Published • 39 -
Scaling Transformer to 1M tokens and beyond with RMT
Paper • 2304.11062 • Published • 2 -
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17