Slim attention: cut your context memory in half without loss of accuracy -- K-cache is all you need for MHA Paper • 2503.05840 • Published Mar 7 • 3
view article Article Topic 33: Slim Attention, KArAt, XAttention and Multi-Token Attention Explained – What’s Really Changing in Transformers? By Kseniase and 1 other • Apr 4 • 14