Collections
Discover the best community collections!
Collections including paper arxiv:2404.14469
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 603 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 96 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 104 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43
-
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Paper • 2404.11912 • Published • 16 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 23 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 257 -
An Evolved Universal Transformer Memory
Paper • 2410.13166 • Published • 3
-
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper • 2404.07413 • Published • 36 -
Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy
Paper • 2404.05238 • Published • 3 -
Cognitive Architectures for Language Agents
Paper • 2309.02427 • Published • 8 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2
-
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models
Paper • 2404.02575 • Published • 48 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 53 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 23 -
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 33
-
Advancing LLM Reasoning Generalists with Preference Trees
Paper • 2404.02078 • Published • 44 -
PointInfinity: Resolution-Invariant Point Diffusion Models
Paper • 2404.03566 • Published • 13 -
MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance
Paper • 2404.08252 • Published • 5 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 23
-
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
Paper • 2403.15246 • Published • 9 -
Noise-Aware Training of Layout-Aware Language Models
Paper • 2404.00488 • Published • 8 -
SnapKV: LLM Knows What You are Looking for Before Generation
Paper • 2404.14469 • Published • 23
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22