Collections
Discover the best community collections!
Collections including paper arxiv:2402.10200
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 98 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 50 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 12
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Paper • 2211.00593 • Published • 2 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 21 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 10
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 16 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 98 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 24 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 21
-
Attention Is All You Need
Paper • 1706.03762 • Published • 41 -
Self-Attention with Relative Position Representations
Paper • 1803.02155 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 14 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 28
-
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 63 -
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Paper • 2402.09025 • Published • 6 -
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 13 -
Algorithmic progress in language models
Paper • 2403.05812 • Published • 18
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 592 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 61 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 48 -
Stealing Part of a Production Language Model
Paper • 2403.06634 • Published • 90
-
Evaluating Very Long-Term Conversational Memory of LLM Agents
Paper • 2402.17753 • Published • 18 -
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding
Paper • 2402.16671 • Published • 26 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 24 -
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Paper • 2402.15000 • Published • 22
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 8 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 98 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 109