-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 14 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 1 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 136
Jeffrey Magder
jmagder
·
AI & ML interests
None yet
Recent Activity
liked
a Space
9 days ago
HuggingFaceH4/blogpost-scaling-test-time-compute
Organizations
None yet
Collections
2
-
Self-Play Preference Optimization for Language Model Alignment
Paper • 2405.00675 • Published • 25 -
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Paper • 2205.14135 • Published • 11 -
Attention Is All You Need
Paper • 1706.03762 • Published • 49 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper • 2307.08691 • Published • 8
models
None public yet
datasets
None public yet