TÜLU 3: Pushing Frontiers in Open Language Model Post-Training Paper • 2411.15124 • Published Nov 22, 2024 • 60
Improving Language Plasticity via Pretraining with Active Forgetting Paper • 2307.01163 • Published Jul 3, 2023 • 7
Cheems: Wonderful Matrices More Efficient and More Effective Architecture Paper • 2407.16958 • Published Jul 24, 2024 • 4
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture Paper • 2412.11834 • Published Dec 16, 2024 • 8
story writing favourites Collection Models I personally liked for generating stories in the past. Not a recommendation, many of these are outdated. • 20 items • Updated 7 days ago • 50
Sparse Autoencoders Collection SAEs are tools for understanding the internal representations of neural networks. These can be loaded using https://github.com/EleutherAI/sae • 9 items • Updated 16 days ago • 3
Pythia Scaling Suite Collection Pythia is the first LLM suite designed specifically to enable scientific research on LLMs. To learn more see https://github.com/EleutherAI/pythia • 18 items • Updated 16 days ago • 29
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective Paper • 2502.17262 • Published 18 days ago • 19
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper • 2502.18364 • Published 17 days ago • 34
SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference Paper • 2502.18137 • Published 17 days ago • 53
Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance Paper • 2502.18772 • Published 16 days ago • 32
Language Models' Factuality Depends on the Language of Inquiry Paper • 2502.17955 • Published 17 days ago • 30
EgoNormia: Benchmarking Physical Social Norm Understanding Paper • 2502.20490 • Published 14 days ago • 5
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding Paper • 2502.05609 • Published Feb 8 • 18
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting Paper • 2503.00784 • Published 12 days ago • 10