Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck Paper • 2404.07647 • Published Apr 11 • 4
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8 • 28
Common Corpus Collection The largest public domain dataset for training LLMs. • 26 items • Updated Mar 20 • 101
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding Paper • 2402.16671 • Published Feb 26 • 26