-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 37 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 90 -
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Paper • 2403.14624 • Published • 50 -
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems
Paper • 2402.12875 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2402.10200
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 90 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 102 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 57 -
Do language models plan ahead for future tokens?
Paper • 2404.00859 • Published • 2
-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 98 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 31 -
ViTAR: Vision Transformer with Any Resolution
Paper • 2403.18361 • Published • 48 -
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
Paper • 2403.18814 • Published • 37
-
Linearity of Relation Decoding in Transformer Language Models
Paper • 2308.09124 • Published • 2 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 90 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 99
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 4 -
Attention Is All You Need
Paper • 1706.03762 • Published • 34 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 11 -
Language Model Evaluation Beyond Perplexity
Paper • 2106.00085 • Published
-
JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention
Paper • 2310.00535 • Published • 2 -
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Paper • 2211.00593 • Published • 2 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 18 -
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Paper • 2307.09458 • Published • 9
-
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 15 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 90 -
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper • 2403.15042 • Published • 24 -
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 17
-
Attention Is All You Need
Paper • 1706.03762 • Published • 34 -
Self-Attention with Relative Position Representations
Paper • 1803.02155 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 11 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 28
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 565 -
Yi: Open Foundation Models by 01.AI
Paper • 2403.04652 • Published • 58 -
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Paper • 2403.08763 • Published • 48 -
Stealing Part of a Production Language Model
Paper • 2403.06634 • Published • 85