Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention Paper • 2606.26560 • Published 5 days ago • 2
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14, 2025 • 305
Jamba 1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Mar 6, 2025 • 87