Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 7 days ago • 55
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 143
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14, 2024 • 60
view article Article Welcome FalconMamba: The first strong attention-free 7B model Aug 12, 2024 • 109