-
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 104 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 78 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20 -
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Paper • 2402.04833 • Published • 6
Collections
Discover the best community collections!
Collections including paper arxiv:2401.08565
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 42
-
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
Simple linear attention language models balance the recall-throughput tradeoff
Paper • 2402.18668 • Published • 18 -
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
Paper • 2402.15220 • Published • 19 -
Linear Transformers are Versatile In-Context Learners
Paper • 2402.14180 • Published • 6
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 27 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 47
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper • 2401.08967 • Published • 27 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 64
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 10 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 54 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 26 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 79
-
Contrastive Chain-of-Thought Prompting
Paper • 2311.09277 • Published • 34 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 9 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 70 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 39
-
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Paper • 2310.09199 • Published • 24 -
A Zero-Shot Language Agent for Computer Control with Structured Reflection
Paper • 2310.08740 • Published • 14 -
Personality Traits in Large Language Models
Paper • 2307.00184 • Published • 20 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 14
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 36 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 42 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20