TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding Paper • 2404.11912 • Published Apr 18, 2024 • 16
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18, 2024 • 54
BASS: Batched Attention-optimized Speculative Sampling Paper • 2404.15778 • Published Apr 24, 2024 • 8
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference Paper • 2404.15420 • Published Apr 23, 2024 • 7
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding Paper • 2404.16710 • Published Apr 25, 2024 • 75
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting Paper • 2404.18911 • Published Apr 29, 2024 • 29
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge Paper • 2405.00263 • Published May 1, 2024 • 14
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10, 2024 • 105
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper • 2404.02575 • Published Apr 3, 2024 • 48