-
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 70 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper • 2311.10775 • Published • 7 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 24 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper • 2311.11501 • Published • 33
Collections
Discover the best community collections!
Collections including paper arxiv:2401.12954
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 61 -
Learning To Teach Large Language Models Logical Reasoning
Paper • 2310.09158 • Published • 1 -
ChipNeMo: Domain-Adapted LLMs for Chip Design
Paper • 2311.00176 • Published • 8 -
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Paper • 2308.09583 • Published • 7
-
When can transformers reason with abstract symbols?
Paper • 2310.09753 • Published • 2 -
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Paper • 2310.10638 • Published • 28 -
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
Paper • 2310.09520 • Published • 10 -
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Paper • 2309.08532 • Published • 52
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 36 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 42 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20
-
Iterated Decomposition: Improving Science Q&A by Supervising Reasoning Processes
Paper • 2301.01751 • Published -
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Paper • 2307.11768 • Published • 12 -
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding
Paper • 2307.15337 • Published • 36