Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.04347

fuck quadratic attention

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8 • 28
Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 132
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 40
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 97

Lee's RoPE Tricks / Context Extension Reads

Set of Long Context (RoPE or otherwise) I'm collecting off of HF

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 106
Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 18
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration

Paper • 2402.11550 • Published Feb 18 • 12
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey

Paper • 2401.07872 • Published Jan 15 • 2

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6 • 13

My (Denis Gordeev) collection of mostly NLP papers. You can message me at t.me/nlp_party

LongAlign: A Recipe for Long Context Alignment of Large Language Models

Paper • 2401.18058 • Published Jan 31 • 21
Efficient Tool Use with Chain-of-Abstraction Reasoning

Paper • 2401.17464 • Published Jan 30 • 15
Scavenging Hyena: Distilling Transformers into Long Convolution Models

Paper • 2401.17574 • Published Jan 31 • 14
Rethinking Interpretability in the Era of Large Language Models

Paper • 2402.01761 • Published Jan 30 • 19

long context LLM

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13 • 23
Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13 • 14
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7 • 25
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6 • 13

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 132
SparQ Attention: Bandwidth-Efficient LLM Inference

Paper • 2312.04985 • Published Dec 8, 2023 • 35
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Paper • 2401.04658 • Published Jan 9 • 24
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13 • 23

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs