TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published Apr 14 • 43
SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference Paper • 2307.02628 • Published Jul 5, 2023 • 10
Focused Transformer: Contrastive Training for Context Scaling Paper • 2307.03170 • Published Jul 6, 2023 • 11
Lost in the Middle: How Language Models Use Long Contexts Paper • 2307.03172 • Published Jul 6, 2023 • 37